Wiki Scraping Time - Githubissues

acroutworst commented 7 years ago

Hello there,

I have been running scape_wiki.py and scraping through Wikiart. It has been well over an hour and is still going strong as it is scraping! Is this expected for this process to be running this long?

By the way, here is a preview as to what I am seeing:

Cheers, Adam

rkjones4 commented 7 years ago

Hi Adam,

Unfortunately this is the expected behavior, as there are over 80000 full size portraits to download, so it can take up to 10 hours to run. I don't believe there is currently a way to download a copy of the dataset directly, as the dataset for this project needs to be sorted into buckets by genre. If you need the script to run faster you could try changing the numbers of pages the script scrapes for each genre (look at the comments in the code) or downloading a different version of the dataset and finding a way to sort it by genre.

acroutworst commented 7 years ago

Yes, I will try changing the numbers of pages the script scrapes for each genre. Thanks for the suggestion.

adam-hanna commented 6 years ago

Could we maybe host the dataset so we don't destroy Wikiart's servers? Maybe a torrent?

rkjones4 / GANGogh

Wiki Scraping Time #1