Open Honowski opened 4 years ago
I second that request, I tried with a couple of users and this library download only the most recent image (i use JD to count the number of media btw). I've also tries to lift the --limit to, for example, 20000, but with no avail.
Modifying a bit the code adding some loging, i think that the issue is on twitter side as the program will reach line 126 and exit.
Twitter has some limits on his APIs , for example https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline
If we try to get up to 20000 tweets at once, maybe we will reach the limit.
The problem is a bit different as i'm not hammering the API. For example i want to download media for this user, i launch the program and it download some of the media (55 to be precise), i expected them to be the most recent ones, but if i download the user data using JDownloader, since the same id, it download some other media (114), so some of them are missing. Am I right in assuming that the ID is monotonically incrementing for all the twitter users? If so, if downloading older media is not possible due to some api limit, shouldn't both program download the same number of media?
exclude_replies/include_rts in the API parameters may cause the difference.
The option "Force grab media" in JDownloader is enabled (so it won't crawl "retweets and other content from users' timelines") and i don't use the -rts flag, I've also re-checked some of the missing media and some of them are indeed replies, but some others aren't.
"reply tweet" or "retweet with comment " can cause difference if there is media in the reply and original one. can you give the user id or tweet id with missing media ?
Sure. The user id is AngelOfGears Regarding the tweet id twitter-dl.txt JDownloader.txt Those are all the ids that JD and twitter-dl returned, i've sorted them and then truncated the jd's one so that both list started at the same id.
Let me know if you need more information and thank you for your time :)
can you check if JD produce different file names with same content ?
JD did download files with the same content but from different tweet ids, for example 1250701846479544320 1246422948057153536, while twitter-dl downloaded only the latter. Also i don't understand why have you added the --rts flag, JD does not download retweet (unless you uncheck an option in the settings).
I've noticed this only scrapes about half of the total images on a person's twitter. It grabs all the videos I believe.