Closed ghost closed 1 year ago
This is a limit on twitter's end, unfortunately.
I personally ended up having to make a PowerShell script that would cycle through two week intervals since the user's registration date using Twitter search queries, with one day of overlap in both from: and until: so nothing gets skipped over (with include:nativeretweets
in the search and &f=live
instead of &f=top
in the url).
Even still, I don't think that will get everything, because Twitter is dumb like that. It caps out retrieving a user's timeline at around 3200 tweets (and that includes retweets).
What usually works for me is using a search query to download:
gallery-dl "https://twitter.com/search?q=(from:USER)"
twitter shadow hide all nsfw from search feature so i think there's no way to download all, you must use https://stevesie.com/apps/twitter-api/scrape/tweets/by-user
From my experience, it seems people usually don't flag stuff as "sensitive content". I've seen that with both art and irl stuff. I've been using the api to get the total tweet count of 100% nsfw accounts and compared it against how much I could get from the search results and it's usually most of the tweets. I tried searching a nsfw tag in the browser and tweets marked by Twitter as "sensitive content" (ones that require you to verify you want to see it) still popped up occasionally. Here's how they determine what's allowed in search results: https://help.twitter.com/en/using-twitter/twitter-search-not-working
it's been blocked since 2020
Bumping this because I'm trying to consolidate my various twitter archives. I have a lot of content that was downloaded from twMediaDownloader and I planned to merge this content with gallery-dl using twitter-click-and-save to minimize file duplication by hardlinking across the drive.
My example case is casulcasulcasul with the following settings:
"twitter":
{
"sleep": 3.0,
"sleep-request": 3.0,
"archive": "/run/media/xxx/bfd18/dl/gallery-dl/sql/twitter.sqlite3",
"archive-format": "{author[name]}—{date:%Y.%m.%d}—{tweet_id}—{filename}",
"username": "xxx",
"password": "yyy",
"cards": false,
"conversations": false,
"quoted": false,
"replies": true,
"retweets": false,
"text-tweets": false,
"twitpic": false,
"users": "timeline",
"videos": true,
"filename": "[twitter] {author[name]}—{date:%Y.%m.%d}—{tweet_id}—{filename}.{extension}"
},
As this account seems to fall under 1000 media tweets, I try gallery-dl https://twitter.com/casulcasulcasul/media
.
Out of the 970 media files twMediaDownloader calculated (using dryrun), gallery-dl using the above command downloaded 944, seemingly omitting anything earlier than November 2019.
Using gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)"
we get a bit further (but the process is much slower):
This results in 968 files...
I don't quite know what/where the issue is, but the two tweets gallery-dl seems to miss are this one and this one. Manually grabbing those tweets with gallery-dl downloads just fine.
EDIT: I guess theoretically, you could use twMediaDownloader to generate a list of media tweets and use the .csv file it provides as input for gallery-dl. :thinking:
EDIT2: Some issues are stating to include filter:media
in your gallery-dl command but I have never once had this work. However, I have found f=media
works, so the full command would be gallery-dl "https://twitter.com/search?q=(from:username)&f=media"
and the quotes are important for single command line but aren't needed if an input file is used. This command gets more media than just gallery-dl https://twitter.com/username/media
and is faster than gallery-dl "https://twitter.com/search?q=(from:username)"
alone.
Are you sure gallery-dl "https://twitter.com/search?q=(from:username)&f=media"
makes a difference? From what I see it shouldn't.
You can use both https://twitter.com/casulcasulcasul/media
and https://twitter.com/search?q=(from:casulcasulcasul)
links. Once you download first one, copy the ID of the last downloaded tweet and put it in a search like this: https://twitter.com/search?q=from:casulcasulcasul+max_id:ID_HERE
. To speed up the process you can add filter:links
(https://twitter.com/search?q=from:casulcasulcasul+max_id:ID_HERE+filter:links
).
Or you can download latest artifact and simply paste https://twitter.com/casulcasulcasul
if you don't want to bother with 2 links yourself.
I will scream it from the rooftops:
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=from:casulcasulcasul+max_id:1081020936185274371+filter:links
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=from:casulcasulcasul+max_id:1081020936185274371
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links"
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links"
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371"
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)+filter:links"
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)"
/run/media/xxx/bfd18/dl/gallery-dl/twitter/casulcasulcasul/[twitter] casulcasulcasul—2022.08.13—1558577888386920448—FaEnVbxUIAAuR8J.jpg
/run/media/xxx/bfd18/dl/gallery-dl/twitter/casulcasulcasul/[twitter] casulcasulcasul—2022.08.12—1558220785201713152—FZ-ytOZakAEydm7.jpg
Addding +
or filter:
will not work with my install of gallery-dl. --version output: 1.23.0-dev
(linux)
@cglmrfreeman use %20
or plain spaces instead of +
signs
$ gallery-dl https://twitter.com/search?q=from:casulcasulcasul%20max_id:1081020936185274371%20filter:links
/tmp/twitter/casulcasulcasul/1081020936185274371_1.jpg
/tmp/twitter/casulcasulcasul/1071670086094643201_1.jpg
...
$ gallery-dl "https://twitter.com/search?q=from:casulcasulcasul max_id:1081020936185274371 filter:links"
/tmp/twitter/casulcasulcasul/1081020936185274371_1.jpg
/tmp/twitter/casulcasulcasul/1071670086094643201_1.jpg
...
Huh, that one worked. I don't think I've ever seen anyone suggest that before. It's always "copy the twitter search url" https://twitter.com/search?q=from%3Acasulcasulcasul+max_id%3A1081020936185274371+filter%3Alinks
which does not work or use the +
or &
signs that seemingly throw Requested user could not be found
.
I will def be using this from now on, thanks!
You probably should have put the link in double quotes "https://twitter.com/search?q=from:casulcasulcasul+max_id:ID_HERE+filter:links"
to get +
working.
But as mikf said plain spaces are fine too. In double quotes as well.
No, +
signs as space replacements do not work in gallery-dl.
The function that parses query parameters does not "support" them, meaning it just returns +
as is and does not replace them with a space character as might be expected.
I see. Well, it still works with twitter specifically. Pluses in a query string are just ignored by twitter (or treated as spaces).
Oh, so the "NotFoundError"s are a bug introduced with https://github.com/mikf/gallery-dl/commit/77bdd8fe0f1702955d0746a81ea7a24c9d1bb065.
This commit splits search queries by whitespace only, and throws an error because there is no user named casulcasulcasul+max_id:1081020936185274371+filter:links
Ah I only recently started using gallery-dl for twitter archiving and I definitely updated after that, so that might explain it.
I see x2, I'm on latest stable ver, so I didn't notice. I thought you would leave the behavior for search as it was. I guess you should also consider that there can be multiple from:
in a query if you haven't already. Also @
can be used instead of from:
For smaller galleries gallery-dl "https://twitter.com/search?q=from:Cotonus filter:links"
does not grab nearly as much as gallery-dl https://twitter.com/Cotonus
, and gallery-dl "https://twitter.com/search?q=from:Cotonus filter:media"
only grabs 3 files. Twitter filters really suck these days.
If you mean retweets you should add include:nativeretweets
in the search
I don't mean retweets.
gallery-dl "https://twitter.com/search?q=from:Cotonus filter:links"
- 28 files
gallery-dl https://twitter.com/Cotonus
- 32 files
gallery-dl "https://twitter.com/search?q=from:Cotonus filter:media"
- 3 files
Yeah, there's 2 posts which don't appear in the search at all. Even without filters.
Popping back in here to say after fairly extensive testing, gallery-dl https://twitter.com/username
is actually giving the maximum number of results at this point.
Popping back in here to say after fairly extensive testing,
gallery-dl https://twitter.com/username
is actually giving the maximum number of results at this point.
usually username and username/media, but pretty sure if their twitter have so many retweet and media, you can't get all, tries some 5-10k tweet to see, that's twitter limit
I tried to download every MP4 from this gallery (NSFW), and it only went as far back as this tweet (also NSFW). After that tweet, it just stopped and acted as though it had downloaded the entire gallery, meaning that any older tweets, such as this one, were excluded.
If I don't use the link for the media tab, it stops at an even more recent tweet (NSFW).
For reference, the command I ran was
gallery-dl "https://twitter.com/furui_1111/media" --filter "extension in ('mp4')"