Contrary to our README, there were SIGIR reports available for download, and http://www.sigir.mil/ now points to a fairly comprehensive site archive now "hosted by the University of North Texas Cyber Cemetery in association with the National Archives and Records Administration." We should scrape these to get them in the system, though of course there won't be any new reports.
This scraper should check whether the year_range intersects with SIGIR's years of operation, so that by default the scraper doesn't hammer the server, but if you run it with --archive, then it re-scrapes the site.
Contrary to our README, there were SIGIR reports available for download, and http://www.sigir.mil/ now points to a fairly comprehensive site archive now "hosted by the University of North Texas Cyber Cemetery in association with the National Archives and Records Administration." We should scrape these to get them in the system, though of course there won't be any new reports.
This scraper should check whether the
year_range
intersects with SIGIR's years of operation, so that by default the scraper doesn't hammer the server, but if you run it with--archive
, then it re-scrapes the site.