Scrape --all in parallel

nicohman / rust-wildbow-scraper

Automatically scrapes wildbow's web serials and compiles them into ebooks

GNU General Public License v3.0

95 stars 20 forks source link

It would be nice if there were more concurrency in the scraping, such as scraping all of the books at the same time, or all of the chapters at the same time. Obviously, this would require a bit of re-architecting, but it would be a big UX improvement. And maybe adding a flag to reject covers every single time instead of having to babysit --all and type "n" every time might be nice.

Also, the installation instructions are unintuitive for non technical people. I am downloading this for my non-technical brother who uses Windows. For example,. using git in the download instructions instead of recommending just downloading the file as a zip from GitHub probably adds many points of failure for non-technical people.

how much of a UX improvement would it really provide? Testing on my machine shows that concurrent book scraping makes the script take as long as the longest book to scrape which is pale at 2 min. Whereas to scrape them all using --all takes 5 min. Its not like they are going to read a full wildbow serial in 5 min. Chapter concurrency would make it much faster but would require a total overhaul in how we find the chapters and how we add the chapters to the epub. I'm not sure if that is worth it.

There is a flag to reject covers every time. Its --covers false. If you want to accept covers every time use -c true

Unfortunately this is github where most assume a level of competency on the command line. I'm also having a hard time imagining a hypothetical user who can't use git but can use cargo. We should have more releases! but I think they should be full executables. I added an issue to create a windows .exe release.

nicohman / rust-wildbow-scraper

Scrape --all in parallel #58