Scrape images - Githubissues

ryanamannion / pcgs_scraper

Programmatically scrape US coin data including prices from www.pcgs.com

Creative Commons Zero v1.0 Universal

5 stars 1 forks source link

Scrape images #2

Open ryanamannion opened 3 years ago

ryanamannion commented 3 years ago

Would be nice to scrape images, or at least the URLs where you can find the images

ryanamannion commented 3 years ago

So this coinfacts scraping addition for the narrative and the images is taking so much longer than I anticipated

I kept running into 429 codes so I had to add some time.sleep() before each coinfacts url request, and it is making it take so long. Right now the half-cents and cents data is scraping and this category alone is up to over 20 minutes. I will probably have to upload the file to google drive to pull it down if I need it, and write a script to update it so I never have to wait this long again

ryanamannion commented 3 years ago

improve requests: https://julien.danjou.info/python-and-fast-http-clients/

ryanamannion commented 3 years ago

Requests improvements resulted in more 429 error codes, so I had to slow them down and only request one coin per second. Ended up taking a while, and I ran into more different error codes. I managed to get all of the coins downloaded, but it was messy and I had to combine dicts from different save points after an error caused it all to fail.

All of these changes still need to be merged in, since they were made on the fly on my server, which I do not have ssh access to right now

ryanamannion commented 3 years ago

initially implemented in commit 2c0ebbcb51d87e4ed50c16efa0691b926699843c