Open receptakill opened 2 years ago
OK, revisiting this 3 months later with latest ver 2.1.2-3 and Java18, behavior for twitter rips appears unchanged. For some twitter accounts with a large number of tweets (inc RTs and nonmedia tweets), there appears to be an arbitrary point beyond which the ripper simply fails to fetch any more tweets. Testing various accts it seems like ripme taps out consistently at retrieval 17, at 200 tweets per retrieval, so effectively quits after apprx 3500-4000 tweets have been processed.
Here is another account prolific enough to experience the problem, https://twitter.com/0zmnds/media (SFW). I attached a ripme log for it. You can check the oldest image downloaded at bottom of log, it only dates back apprx two months - and this account has been posting images for years.
Is this a rate limit issue? The logger throws no errors related to it, it simply concludes the rip as if nothing is wrong. Futzing with dl threads, twitter.max_requests and twitter.rip_retweets, even twitter.max_items_requests does not change the behavior.
If it's a rate limit issue, can the ripper be updated to complete a full twitter rip across several sessions? Or, barring that, perhaps let users specify a beginning statusid to crawl back from in rip.properties so they can crawl through old tweets manually?
Expected Behavior
Expected a clean rip of the media posts of the account from top to bottom.
Actual Behavior
ripme grabbed the first 64 media posts then quit, 'Rip Complete'. No errors in log. Repeating just iterates over the same items and quits again at the same spot. This is despite there being more content beyond where it stops at.
I repeated this several times then went away for a while, came back and tried again, and this time it nabbed 103 images (inlcusive of those it had grabbed before) and then quit, again prematurely - and repeating the rip then had it stop at this number over and over as well. I'm not sure what changed between the first set of attempts and the second.
I tried several other twitter accounts and they ripped fine from start to finish. Not sure what's special on this one. The tweets are not private.
I tried ripping without URL History checked and there's no problem with that. I'm using the default twitter auth; tried using my own API key to see if it would make a difference but I'm dumb and couldn't get that to work. [UPDATE] I was finally able to generate my own 1.1 api key, and this did not change the ripper's behavior at all. So I doubt it's a rate limit problem or anything else related to a shared api key.
I notice towards the end of the rip, it's grabbing less and less items in between 'Downloading next page' entries in the log. Until at the end, it's just several 'Downloading next page' lines without any image grabs at all, despite this account being basically all self-posted media from top to bottom.