Failing to Download Requested ZIP Files

The download of congressional records fails if the govinfo.gov server cannot deliver the requested ZIP file within two minutes (i.e., four retries at 30 seconds intervals). Here's an example from the cr2.log file:

INFO:root:Sending request on 2022-06-28 14:09 DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=2, connect=None, read=None, redirect=None, status=None) DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=1, connect=None, read=None, redirect=None, status=None) DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=0, connect=None, read=None, redirect=None, status=None) DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None DEBUG:root:Request headers received with code 503 WARNING:root:Unexpected condition, not continuing: 503 WARNING:root:Failed to download file https://www.govinfo.gov/content/pkg/CREC-2018-01-10.zip WARNING:root:fdsysDL received report that download for 2018-01-10 did not complete. INFO:root:No record on this day, not trying to extract WARNING:root:[Errno 2] No such file or directory: 'output/2018/CREC-2018-01-10/mods.xml', skipping.

The corresponding ZIP file in the above example does indeed exist. It simply takes more than two minutes for the govinfo.gov server to create it on the fly. Please note that once the ZIP file is created, it is available for download (for an unknown amount of time, though at least 24 hours). Running the download code again a couple of minutes after the first failure then gives you the file without trouble.

Perhaps a solution to this is simply to send a request to the server for all congressional records in question in order for the server to create the ZIP files, then come around again to collect them in the hope that the server had enough time to create them.

Moreover, I noticed that the code runs through all dates within a given range to check if there are files available. It seems a waste of resources to do that. I suggest to use the govinfo API (https://api.govinfo.gov/docs/ ) to check first which congressional records are actually available within a given date range.

unitedstates / congressional-record

Failing to Download Requested ZIP Files #48