radiolarian / AO3Scraper

A Python scraper for getting fan fiction content and metadata from Archive of Our Own.
172 stars 55 forks source link

No way to restart ao3_work_ids.py without starting over #3

Closed cmd16 closed 6 years ago

cmd16 commented 6 years ago

I know ao3_get_fanfics.py has a restart command line argument, but ao3_work_ids.py doesn't have the same functionality. When downloading a lot of fanfiction, it can be quite frustrating to have to start all over again just because I lost internet connection. Please add a restart command line argument to ao3_work_ids.py or explain how I can resume scraping without having to start from the beginning. Thank you.

ssterman commented 6 years ago

ao3_work_ids saves the current page URL in the output csv. You can restart the scrape from the same point by using that URL as your command line argument. If you save to the same CSV, make sure to delete the extra header line that will be written when the scrape restarts. Note that because ao3_work_ids uses the AO3 search page to retrieve stories, any change to the results of the search will change the results of the work id scrape.