Closed divergentdave closed 9 years ago
I reviewed each changed inspector, and all the changes are positive and work fantastically. Thanks, @divergentdave! I added a few minor commits that rewrite report/landing page URLs to be HTTPS for a few IGs that have since migrated. In a few cases, we were using HTTPS for their hardcoded URLs, but their HTTPS page was still linking to the HTTP versions of reports and landing pages.
Also, I'm having trouble causing the hhs
scraper to download anything to disk and create data in the data/
directory -- but I'm nearly positive that has nothing to do with this PR, so just flagging it before I merge.
The HHS scraper is working for me. Are you letting it run to completion? That particular scraper is two-pass, and the first pass takes forever, before it gets to saving reports.
Ohhhhh, yeah, I'm sure that's it. No, I wasn't letting it go to completion, which explains it.
This fixes a variety of scraper issues that have been building up, including 4 partial rewrites for new sites, several missing dates, and support for .docx files.
Since this adds a dependancy on the
python-docx
modue, deploying will require runningpip -r requirements.txt again
.