humandecoded / twayback

Automate downloading archived deleted Tweets.
Apache License 2.0
178 stars 23 forks source link

KeyError: 'closest' when parsing accounts with larger number of tweets. #3

Closed Traut89now closed 2 years ago

Traut89now commented 2 years ago

Hi,

after the latest update, the tool seems to be working perfectly when scraping accounts with lower number of tweets archived, but when I tried accounts with 1000+ tweets archived, the process failed with: Traceback (most recent call last): File "C:\15tway\twayback B\twayback.py", line 112, in wayback_url = (jsonResponse['archived_snapshots']['closest']['url']) KeyError: 'closest'

as also seen below on the screenshots.

image

Mennaruuk commented 2 years ago

Thanks for reporting the issue. Yes, I have gotten the same error in the past, and it's frustrating. Basically, it's Archive.org's fault. They can for whatever reason think the requests you're sending should be refused.

To mitigate this, I've made it so that if you get an error like this, Python will wait 60 seconds and try again, and if it succeeds, it'll pick up right from where it left off. (Similar to the screenshot below.) I also added a progress bar so you can see the progress.

image

I updated the release, please try again and let me know if you face more errors.

P.S. I'm very close to finding a much better alternative to this method, it won't require connecting to Archive.org anymore, so you won't need to go through the pain of this again. Hopefully I can get it implemented this week!

P.P.S. It's implemented in latest release! You won't have to worry about this error anymore :)

Traut89now commented 2 years ago

Works brilliantly now! And the improvements to speed are fantastic...

Thanks!

xander8945 commented 2 years ago

I seem to still be dealing with this issue. Any idea why? image

Mennaruuk commented 2 years ago

Hi @xander8945 , please download the latest version in Twayback’s GitHub repo, this method above has been replaced with a better one.