mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.78k stars 964 forks source link

[Question] Twitter timeline strategy #3708

Open skulkexpert opened 1 year ago

skulkexpert commented 1 year ago

I've noticed that there is an option called extractor.twitter.timeline.strategy. I've been using a system where I first download a user's media timeline with https://twitter.com/{user}/media and then catch any missing tweets by running https://twitter.com/search?q=from:{user}. This was what was recommended when I first made these settings. Is there a better way to accomplish this now, without having to make two runs of gallery-dl per user?

Here is my config. I only want images from the user in the link, unless it is a quoted tweet, in which case I download the quoted tweet (not retweet) into the first user's folder. I also want any media that the user has in their replies, but only from the same user.

"twitter":
{
    "username": REDACTED,
    "password": REDACTED,
    "parent-directory": true,
    "quoted": true,
    "replies": "self",
    "expand": true,
    "pinned": true,
    "retweets": false,
    "twitpic": true,
    "videos": true,
    "cards": true,
    "metadata": true,
    "postprocessors": [{            
        "name": "metadata",
        "event": "post",
        "mode": "json",
        "filename": "{tweet_id}.json"                
    }]
}
mikf commented 1 year ago

Is there a better way to accomplish this now, without having to make two runs of gallery-dl per user?

There is. Since v1.22.0, using twitter.com/USERNAME as input URL does the same strategy that you described by default.

extractor.twitter.timeline.strategy lets you control which timeline gets used first, since a user might want something different than the Tweets from https://twitter.com/{user}/media.

I only want images from the user in the link, unless it is a quoted tweet, in which case I download the quoted tweet (not retweet) into the first user's folder. I also want any media that the user has in their replies, but only from the same user.

Your config should do just that.

You could add "image-filter": "author is user" to ensure that all media files are from the user in the input URL

skulkexpert commented 1 year ago

@mikf Thanks for the reply. Here is my new config:

            "username": REDACTED,
            "password": REDACTED,
        "parent-directory": true,
        "timeline.strategy": "media",
        "image-filter": "author is user",
        "quoted": true,
            "replies": "self",
        "expand": true,
        "pinned": true,
            "retweets": false,
            "twitpic": true,
            "videos": true,
        "cards": true,
        "metadata": true,
        "postprocessors": [{            
                "name": "metadata",
                "event": "post",
                "mode": "json",
                "filename": "{tweet_id}.json"                
            }]

However, this doesnt seem to be downloading quoted tweets (though I'm not entirely sure that I want them). I wonder if I could get the text from the quoted tweets (both the tweet that quoted and the tweet that got quoted) without downloading the media from the quoted user's tweet (if they arent the same as the user being downloaded)?

EDIT: And another question, when updating my twitter galleries, I would usually use --abort 10 on both the media and the search link. Would doing the same be enough to capture any new tweets without missing anything with the new twitter strategy option (only using twitter.com/USERNAME)? expand is taking up a lot of requests, is it really necessary with my set up, in order to get all the user's media?

UPDATE: Using these settings seems to miss one or two tweets that you get with https://twitter.com/search?q=from:{user} for some reason.