mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.78k stars 965 forks source link

twitter | help with skipping/--no-skip and --filter, and config #6198

Closed Baniita closed 1 month ago

Baniita commented 1 month ago

Hi, I'm about to lose my mind bc I've had so much trouble trying to get gallery-dl to work for me, please help... I have so many issues...

  1. I ran a trial run earlier with --filter and it ran fine after eleventeen tries (didn't always work, sometimes it seemed to ignore my --filter) but I was trying to test some commands, and now subsequent runs are skipping those posts, which I DON'T want. a) Where is the history of seen posts stored? Maybe I can clear those out? Is it just cache.sqlite3, or is it also stored elsewhere? When I renamed cache.sqlite3, I think it still skipped posts marked seen...? b) There are 3 "skip: true/false" areas in my config. I'm not sure which one I want to adjust to make it not skip stuff that's already been downloaded.

  2. Do these commands look right?

gallery-dl "https://twitter.com/tls6491" --cookies "C:\Users\Bani\AppData\Roaming\gallery-dl\cookies.txt" -d "C:\Users\Bani\Pictures\zz DL" --filter "datetime(2024, 9, 10) <= date < datetime(2024, 9, 17)"

gallery-dl "https://twitter.com/tls6491" --cookies "C:\Users\Bani\AppData\Roaming\gallery-dl\cookies.txt" -d "C:\Users\Bani\Pictures\zz DL" --filter "date >= datetime(2024, 9, 15)" --verbose --no-skip

  1. Does --filter conflict with --no-skip? I wanted it to re-download everything within a certain date range. It seemed to ignore the filter entirely when metadata and postprocessing was in config

  2. Re: my config... a) Is there anywhere that explains the cards, conversations, strategy options, path-extended fields? I don't know what they mean or do. b) what is the difference between "users": "user" and "users": "timeline"? my old config used timeline. b) I copied most of these fields from other people... I have no idea what they do, particularly the post-processor lol... what does that post-processor do... er... anyway, keeping that one in causes me a lot of problems...? c) how does the rest of the config look?

    "extractor":
    {
        "base-directory": "./gallery-dl/",
        "parent-directory": false,
        "postprocessors": null,
        "archive": null,
        "cookies": "C:/Users/Bani/AppData/Roaming/gallery-dl/cookies.txt",
        "cookies-update": true,
        "proxy": null,
        "skip": false,
    
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:115.0) Gecko/20100101 Firefox/115.0",
        "retries": 4,
        "timeout": 30.0,
        "verify": true,
        "fallback": true,
    
        "sleep": 0,
        "sleep-request": 0,
        "sleep-extractor": 0,
    
        "path-restrict": "auto",
        "path-replace": "_",
        "path-remove": "\\u0000-\\u001f\\u007f",
        "path-strip": "auto",
        "path-extended": true,
    
        "extension-map": {
            "jpeg": "jpg",
            "jpe" : "jpg",
            "jfif": "jpg",
            "jif" : "jpg",
            "jfi" : "jpg"
        },
                "twitter":
        {
            "username": "tls6491",
            "password": "-redacted-",
            "filename": "{author['name']}-{tweet_id}-0{num}.{extension}",
            "base-directory": "C:/Users/Bani/Pictures/zz DL",
            "sleep": 2,
            "sleep-request": 2,
            "ratelimit": "wait:1800",
            "cards": false,
                    "conversations": true,
            "pinned": true,
            "quoted": true,
                "replies": true,
                "retweets": true,
                "strategy": null,
            "locked": "wait",
            "twitpic": true,
            "unique": true,
                    "users": "user",
            "videos": false,
            "expand": false,
            "relogin": true,
            "size": "orig",
            "skip": false,
            "path-extended": false,
            "retries": 1,
            "retry-codes": [429, 430],
            "metadata": true,
            "postprocessors":[
                {
            "name": "mtime",
                    "key": "date"
                },
                {   
                    "name": "metadata",
                    "event": "post",
                    "filename": "{author['name']}-{tweet_id}-{num}_{date:?//%Y-%m-%d %H_%M_%S}.json"
                }
            ]
        }
    },
    
    "downloader":
    {
        "filesize-min": null,
        "filesize-max": null,
        "mtime": true,
        "part": true,
        "part-directory": null,
        "progress": 3.0,
        "rate": null,
        "retries": 8,
        "timeout": 30.0,
        "verify": true,
    
        "http":
        {
            "adjust-extensions": true,
            "chunk-size": 32768,
            "headers": null,
            "validate": true
        },
    
        "ytdl":
        {
            "format": null,
            "forward-cookies": false,
            "logging": true,
            "module": null,
            "outtmpl": null,
            "raw-options": null
        }
    },
    
    "output":
    {
        "mode": "auto",
        "progress": true,
        "shorten": true,
        "ansi": false,
        "colors": {
            "success": "1;32",
            "skip"   : "2"
        },
        "skip": false,
        "log": "[{name}][{levelname}] {message}",
        "logfile": null,
        "unsupportedfile": null
    },
    
    "netrc": false
    }
mikf commented 1 month ago

a) Where is the history of seen posts stored?

In a download archive, which you haven't enabled, so nowhere. It won't overwrite already existing files though, at least not by default. That's what --no-skip / "skip": "false" is for.

b) There are 3 "skip: true/false" areas in my config. I'm not sure which one I want to adjust to make it not skip stuff that's already been downloaded.

The one in the twitter block.

You might want to re-enable the output.skip one, since it wont display skipped downloads otherwise.

Do these commands look right?

I think so.

Does --filter conflict with --no-skip?

It doesn't.

--filter makes gallery-dl completely ignore files for which the filter expression is false. --no-skip causes gallery-dl to overwrite already downloaded files which weren't --filtered.

Is there anywhere that explains the cards, conversations, strategy options, path-extended fields?

https://gdl-org.github.io/docs/configuration.html#extractor-twitter-ads (scroll down a bit to see all twitter options)

what is the difference between "users": "user" and "users": "timeline"?

https://gdl-org.github.io/docs/configuration.html#extractor-twitter-users

I copied most of these fields from other people... I have no idea what they do,

Nice.

what does that post-processor do

https://gdl-org.github.io/docs/configuration.html#postprocessor-options

The mtime one sets the mtime of downloaded files to the time stored in the date metadata field.

The metadata one writes each Tweet's metadata to an external .json file.

ForxBase commented 1 month ago

How are you not getting banned by Twitter wehen using gallery-dl? Many are getting banned.

Baniita commented 1 month ago

So in the SQlite, nowhere else? I swear it kept marking things as seen even when I moved the sqlite lol... did I make an error..

the extractor section skip doesn't matter? or is it like, extractor is for every extractor, but twitter is just for twitter--

so... no difference between "user" and "timeline"? 😂 I assume functionally there's no difference...

the doc on postprocessors may be beyond me rip. I've been having issues with it enabled. will it still record my history even if I have postprocessor section on my twitter config off? (maybe I will want it to later, but when I used the postprocessor section, it just downloaded jsons and I worried it'll never download the actual pics, so I had to shut it down before I got ratelimited for a day)

thank you very much!

mikf commented 1 month ago

So in the SQlite, nowhere else?

SQLite archives (not the cache file) and files already present in the filesystem.

or is it like, extractor is for every extractor, but twitter is just for twitter--

Exactly. When skip is defined for twitter, it overrides the "global" extractor.skip setting, but only for twitter URLs.

so... no difference between "user" and "timeline"?

You can set different options per subcategory (like user and timeline), but yes, there is no functional difference.

will it still record my history even if I have postprocessor section on my twitter config off

Yes. Post processors and archives for actual downloaded files are completely independent.