Automated scraping markup+CSS from a list of relevant URLs, using a variety of user-agent strings. Provides reporting on usage of CSS properties and apparent user-agent sniffing.
Implemented saving items in the pipeline as opposed to the spider
Fixed user agent bug that was found after switching to the item pipeline
Changed directory structure of saving into /media
Fixed a race condition, changed model to use a FK constraint (site_hash, batch_fk)
Sitescan is now represented by the start_url provided from the textfile. The physical save location is still based on just the top level domain though.