LOC.gov pagination limits make it impossible to get all of big collections

The pagination limits make it so that you can't go past 100,000 items. This means you can't get all of Chronicling America.

A sample log entry from the crawler

cchc-crawler  | time="2021-08-13T03:47:34Z" level=warning msg="HTTP error when fetching from API" http_code=400 http_error="400 Bad Request" url="https://www.loc.gov/collections/chronicling-america/?at%21=aka%2Cbreadcrumbs%2Cbrowse%2Ccategories%2Ccontent%2Ccontent_is_post%2Cexpert_resources%2Cfacet_trail%2Cfacet_views%2Cfacets%2Cfeatured_items%2Cform_facets%2Clegacy-url%2Cnext%2Cnext_sibling%2Coptions%2Coriginal_formats%2Cpages%2Cpartof%2Cprevious%2Cprevious_sibling%2Cresearch-centers%2Cshards%2Csite_type%2Csubjects%2Ctimeline_1852_1880%2Ctimeline_1881_1900%2Ctimeline_1901_1925%2Ctimestamp%2Ctopics%2Cviews&c=1000&fa=online-format%3Aonline+text&fo=json&sp=101&st=list"

Going to that URL in the pagination does in fact return a 400 error.

Probably need to ask if there is a way around this.

Cf. #18.

lmullen / cchc

LOC.gov pagination limits make it impossible to get all of big collections #22