Closed shanecav84 closed 7 years ago
Awesome, thanks for doing this! I'll take a look at it. I'm going to add a commit to rename eeoc-new
to eeoc
, as we don't need to keep the old scraper around.
Regarding the old reports no longer online, we have copies of the PDFs and associated metadata around still. I'll find which reports are now missing, clean them up, and add them over at the unitedstates/reports repository.
Reports with multiple files are a longstanding hairy issue. (see #112) In cases such as that example, we usually pick the most important/substantive file, dropping transmittal letters or memoranda. The EPA OIG scraper is a good example of that approach.
Thanks again for tackling this!
Thanks, @divergentdave, for rounding out the PR!
Looks good, merging and deploying!
Edit: Up at https://oversight.garden/reports?inspector=eeoc
Closes #247.
--archive
eeoc.py
in for now.TODO: