mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.33k stars 924 forks source link

[Twitter] How do you download only text-tweets from your bookmarks? #5672

Open ShadowX1X opened 3 months ago

ShadowX1X commented 3 months ago

I only recently realised that my bookmarks archive didn't have text-only tweets downloaded, so I'm retroactively trying to download the ones I've missed. Currently, the relevant Twitter section in my config file looks like this:

"postprocessors": [
    {
        "name": "metadata",
        "filename": "{date_bookmarked}_{tweet_id}_{num:>02}.txt",
        "content-format": [
            "[Tweet text]:\n\n{content}\n\n\n=========================================================\n\n",
            "[Page address]:\nhttps://twitter.com/{author[name]}/status/{tweet_id}\n",
            "[Image address]:\n{file_url}\n",
            "[Author handle]:\n{author[name]}\n",
            "[Author nickname]:\n{author[nick]}\n\n\n",
            "[Date posted]:\n{date}\n",
            "[Date retrieved]:\n{time:now()}\n",
            "[Date bookmarked]:\n{date_bookmarked}\n\n\n",
            "[Number of views]: Unknown",
            "[Number of likes]: {favorite_count}",
            "[Number of retweets]: {retweet_count}",
            "[Number of quote retweets]: {quote_count}",
            "[Number of replies]: {reply_count}"
        ]
    }
],

"cards": true,
"conversations": true,
"pinned": false,
"quoted": true,
"replies": true,
"retweets": true,
"text-tweets": true,
"twitpic": false,
"videos": false

This config skips over the text tweets without media. I tried adding "event": "post" to the postprocessor but that made it start downloading the metadata for previously downloaded images instead of skipping over them (basically duplicating the txt metadata from the images I already had, but with the end of the duplicate filenames being None instead of an enumeration via {num:>02}), though it did catch the text-only tweets.

Thanks!

mikf commented 3 months ago

Add

"filter": "count == 0"

to your post processor.

"event": "post" is also necessary.

ShadowX1X commented 3 months ago

I got a weird result using those: for some reason it created metadata txts for several bookmarks I already had (these bookmarks already have image + metadata txt), seemingly at random.

mikf commented 3 months ago

count contains the number of files of a Tweet, so count == 0 should only select text-only Tweets and never a Tweet with images, videos, etc. And it definitely shouldn't be doing anything "random".

https://github.com/mikf/gallery-dl/blob/31bdb288ef2e173b268cc115a73f064856351eab/gallery_dl/extractor/twitter.py#L135

You should probably also be using --no-download and/or --filter 0 to ignore any actual files, but that's optional.

ShadowX1X commented 3 months ago

I tried both --no-download and --filter 0 and both resulted in the same behavior as before.

What I do notice is that the erroneous repeat metadata txt files it's creating seemingly do not detect the images in the posts, since the filenames all have None at the end as well as "None" for the image address I made it record within the file (both of which would make sense for a text-only post because of {num:>02} for the txt file name and {file_url} inside the txt for the image URL, but these erroneous metadata txt files are from posts which DO have images, like this one (mostly SFW), the metadata txt's name being 2024-05-18 09_56_55_1776426395477958671_None.txt.

When I have "event": "post" and "filter": "count == 0" in the config and download new bookmarked image posts I've added since yesterday, it also does the same for ALL the metadata txts that it creates accompanying each image, ie: {num:>02} returns None and {file_url} returns "None" for every metadata txt file. Removing only "filter": "count == 0" has the same effect of creating the erroneous metadata txts, and removing only "event": "post" results in nothing new being downloaded or metadata txt files being created, which is the intended effect.

I am on gallery-dl v1.26.8-dev so perhaps something was fixed in an update? I will try updating and see... EDIT: no difference after getting the latest dev version.