File request returns "generating file" message instead of 404

unitedstates / congressional-record

A parser for the Congressional Record.

Other

128 stars 40 forks source link

File request returns "generating file" message instead of 404 #41

Closed danielmroberts closed 4 years ago

danielmroberts commented 4 years ago

We started receiving "File is not a zip file" error recently and found that the zip file saved was actually html with message about the file being generated, check back in 30 seconds.

I added the following bit of code in govinfo/downloader.py downloadRequest() to work around the issue temporarily but not sure if it is the proper long-term solution.

            elif r.status == 200 and 'Generating File' in r.data:
                logging.warn('Received 200 but content indicates file generating or not available.')
                self.status = 404

nclarkjudd commented 4 years ago

Thanks for this bug report! Could you provide any additional scope conditions on reproducing the error? For instance, does this only happen when attempting to access very recent days of the Record, or does it occur when attempting to access older archives as well?

danielmroberts commented 4 years ago

This response occurs on days where no zip file is available. Can be a day in the past or today before a file is made available.

nclarkjudd commented 4 years ago

That's helpful, thanks!

I previously submitted https://github.com/unitedstates/congressional-record/commit/05924a4494b755b2d0eeaa27144e77aa386f01f9 to address a previous iteration of this case.

When I have a chance, I will look into an approach that can catch this specific error and the more general case of r.data not actually being a ZIP file.

Of course, if you think of one first, I would welcome a pull request!

edit: Just to note that the parser now fails tests that it passed a year ago, so the test suite would have caught this if it was run recently

nclarkjudd commented 4 years ago

Decided to just figure this out. Think it's all set. The new code should catch any and all instances of govinfo returning something that's not a ZipFile, so there won't be the need for any additional whack a mole.