The pagination limits make it so that you can't go past 100,000 items. This means you can't get all of Chronicling America.
A sample log entry from the crawler
cchc-crawler | time="2021-08-13T03:47:34Z" level=warning msg="HTTP error when fetching from API" http_code=400 http_error="400 Bad Request" url="https://www.loc.gov/collections/chronicling-america/?at%21=aka%2Cbreadcrumbs%2Cbrowse%2Ccategories%2Ccontent%2Ccontent_is_post%2Cexpert_resources%2Cfacet_trail%2Cfacet_views%2Cfacets%2Cfeatured_items%2Cform_facets%2Clegacy-url%2Cnext%2Cnext_sibling%2Coptions%2Coriginal_formats%2Cpages%2Cpartof%2Cprevious%2Cprevious_sibling%2Cresearch-centers%2Cshards%2Csite_type%2Csubjects%2Ctimeline_1852_1880%2Ctimeline_1881_1900%2Ctimeline_1901_1925%2Ctimestamp%2Ctopics%2Cviews&c=1000&fa=online-format%3Aonline+text&fo=json&sp=101&st=list"
Going to that URL in the pagination does in fact return a 400 error.
Probably need to ask if there is a way around this.
The pagination limits make it so that you can't go past 100,000 items. This means you can't get all of Chronicling America.
A sample log entry from the crawler
Going to that URL in the pagination does in fact return a 400 error.
Probably need to ask if there is a way around this.
Cf. #18.