Now that the program can crawl multiple pages, a next good step is to make it possible to update currently downloaded files without having to redownload everything already in /web_novels/.
This will involve:
making a config file that contains crawled pages
comparing the next page to be crawled to crawled pages
only appending to the file if the page has not already been crawled
It might be better to include a last_crawled variable in the page template rather than compare a list of crawled pages, so the update only occurs if there is a new link to follow to the next page. This will definitely work for royal road, but I'm unsure how well it will generalize.
Now that the program can crawl multiple pages, a next good step is to make it possible to update currently downloaded files without having to redownload everything already in /web_novels/. This will involve:
It might be better to include a
last_crawled
variable in the page template rather than compare a list ofcrawled
pages, so the update only occurs if there is a new link to follow to the next page. This will definitely work for royal road, but I'm unsure how well it will generalize.