Automated scraping markup+CSS from a list of relevant URLs, using a variety of user-agent strings. Provides reporting on usage of CSS properties and apparent user-agent sniffing.
22
stars
9
forks
source link
cleaned up manage command to take only filename as arg #8
This should use the *args that is passed to handle (which has already gone through option-parsing) rather than just pulling the value out of self._argv directly. Otherwise it'll get confused if standard management command options (e.g. --traceback) are used, because it'll think they're the filename. Also define the class-var args so that manage.py help scraper can show something more useful (see https://github.com/mozilla/moztrap/blob/master/moztrap/model/core/management/commands/import.py for an example).
And while we're cleaning up the command-line UI, what about using a verb (either scrape or crawl) rather than the noun scraper for the management command name?
This should use the
*args
that is passed tohandle
(which has already gone through option-parsing) rather than just pulling the value out ofself._argv
directly. Otherwise it'll get confused if standard management command options (e.g.--traceback
) are used, because it'll think they're the filename. Also define the class-varargs
so thatmanage.py help scraper
can show something more useful (see https://github.com/mozilla/moztrap/blob/master/moztrap/model/core/management/commands/import.py for an example).And while we're cleaning up the command-line UI, what about using a verb (either
scrape
orcrawl
) rather than the nounscraper
for the management command name?