niemands / StashPlugins

A collection of python plugins for stash
GNU Affero General Public License v3.0
62 stars 17 forks source link

No handler for incorrect URLs during Bulk Media Scrapping #12

Open mrcomicon opened 3 years ago

mrcomicon commented 3 years ago

Under the bulk scraping function when it checks if the scenes URL returns a Null, your script assumes if it does that its because of a missing scraper when in fact could be do to multiple reasons such as an incorrect link. The following result is that it assumes if one URL for that site doesn't work, that all URLs for that site wont work because in the subroutine you add the URLs netloc to a blacklist. So for example you have a 20 scenes tagged scrape all from site abc.abc how ever the 5th scene to be scraped has a incorrect link the first 4 scenes will be scraped successfully but the script will simply skip scenes 6-20 as they have the same netloc as the incorrect link. You can fix this by simply removing the portion of the code that adds the netloc and instead just adds the whole URL to missing_scrapers. It could be helpful to output this list to a file for informational reasons. However if you do make this change you would no longer have protection for missing scrappers, perhaps there is a way to use the stash interface to see what scrapers are loaded and add them to a whitelist at the beginning of the script?

niemands commented 3 years ago

Thank you for pointing this out, I will take a look into it