Automated scraping markup+CSS from a list of relevant URLs, using a variety of user-agent strings. Provides reporting on usage of CSS properties and apparent user-agent sniffing.
Fixed bug where sites containing two links to the same page emit two identical requests (thus saving duplicates)
Replaced the default scrapy depth middleware with a modified one that allows JS and CSS to be saved 1 level deeper, so a page scanned at depth 2 might have a stylesheet on depth 3 but that will be saved with this modification.
Modified the offsite middleware to allow remote JS and CSS because many sites use CDNs.