mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.85k stars 975 forks source link

unexpected error in gallery-dl while scraping twitter account. Instructions told me to report it here #4811

Closed rarelygoeshere closed 11 months ago

rarelygoeshere commented 12 months ago

Hello everyone, I was scraping this account when I checked again, and noticed the cmd said this:

[twitter][error] An unexpected error occurred: KeyError - 'user'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .

Seeing this, I did what I believed is the instructions and the below image is the output from the command. screenshot_1699942869

But then I noticed it wants me to add an URL so I tried again with gallery-dl https://twitter.com/malteserrefs --verbose

Sadly, it appears to have unexpectedly created too much output, to the point that I can't scroll up to copy the rest of the output the command generated.

I don't know how to fetch the rest of the output that can't be displayed by scrolling up, so I have copied the rest of the displayed output to this txt file (Warning: This is a very long txt file) gallery-dll text stuff.txt

I hope that would be useful for anyone checking this. If not, I might need to perhaps run the same command again but somehow figure out how to have the output be copied into a txt file concurrently.

mikf commented 12 months ago

I'd need the full traceback, which you'd see together with [twitter][error] An unexpected error occurred … when --verbose is enabled.

The output file you uploaded doesn't seem to actually show any error.


Your current filename format skips any file after the first in each Tweet since they all get the same name. You should include a {num} field in your filenames to fix that.

rarelygoeshere commented 12 months ago

I'd need the full traceback, which you'd see together with [twitter][error] An unexpected error occurred … when --verbose is enabled.

The output file you uploaded doesn't seem to actually show any error.

Your current filename format skips any file after the first in each Tweet since they all get the same name. You should include a {num} field in your filenames to fix that.

Dammit, sorry to tell you this, but a power outage occurred in my house and I lost the cmd window :( Would rerunning the same command help? I can simply move the folder it generated to another place to run it again

As for filename format, where should I add the {num} field? My format is the following: "filename": "{author[name]}-{author[id]}({author[date]:%Y%m%d_%H%M%S})-{tweet_id}({date:%Y%m%d_%H%M%S}).{extension}"

Incidentally, is that problem why when it tried scraping tweets with multiple images for me it only scrapes the first one? If so, that explains why it didn't get all the files I wanted then.

Edit: After some testing with where to add the {num} field, I believe I've got it added in a good place to fix the problem. This is how I added it: "filename": "{author[name]}-{author[id]}({author[date]:%Y%m%d_%H%M%S})-{tweet_id}({date:%Y%m%d_%H%M%S})-{num}.{extension}"

rarelygoeshere commented 11 months ago

Hello, I have decided to run the command again with an attempt at trying to copy paste the output to a text file, like this: gallery-dl https://twitter.com/MalteserRefs --verbose >gallery-dl-text.txt

Prior to doing so, I have updated my g-dl to the latest version that was recently released. Below is the resulting output of the command, which you can find in the verbose output file, and the attempt at copy pasting it to a text file with > which can be found in the second file below verbose output (Though this doesn't seems to have created a good result, I will add it in regardless so you can know the actions I've taken)

I hope these outputs are more useful for you than previous outputs I sent. gallery-dl verbose output.txt gallery-dl-text.txt

mikf commented 11 months ago

I found the problem. Twitter's internal code sometimes experiences timeouts and it then responds with incomplete data, which causes these exceptions in gallery-dl, as well as a block of errors like the following, which are currently not getting handled.

I guess API responses with errors could just be retried until they succeed.

Timeout Error ``` json { "message": "Timeout: Unspecified", "locations": [ { "line": 2004, "column": 3 } ], "path": [ "user", "result", "timeline_v2", "timeline", "instructions", 0, "entries", 0, "content", "itemContent", "tweet_results", "result", "legacy" ], "extensions": { "name": "TimeoutError", "source": "Server", "retry_after": 0, "code": 29, "kind": "ServiceLevel", "tracing": { "trace_id": "b974771ef99489c7" } }, "code": 29, "kind": "ServiceLevel", "name": "TimeoutError", "source": "Server", "retry_after": 0, "tracing": { "trace_id": "b974771ef99489c7" } }, ```
rarelygoeshere commented 11 months ago

I found the problem. Twitter's internal code sometimes experiences timeouts and it then responds with incomplete data, which causes these exceptions in gallery-dl, as well as a block of errors like the following, which are currently not getting handled.

I guess API responses with errors could just be retried until they succeed. Timeout Error

    {
      "message": "Timeout: Unspecified",
      "locations": [
        {
          "line": 2004,
          "column": 3
        }
      ],
      "path": [
        "user",
        "result",
        "timeline_v2",
        "timeline",
        "instructions",
        0,
        "entries",
        0,
        "content",
        "itemContent",
        "tweet_results",
        "result",
        "legacy"
      ],
      "extensions": {
        "name": "TimeoutError",
        "source": "Server",
        "retry_after": 0,
        "code": 29,
        "kind": "ServiceLevel",
        "tracing": {
          "trace_id": "b974771ef99489c7"
        }
      },
      "code": 29,
      "kind": "ServiceLevel",
      "name": "TimeoutError",
      "source": "Server",
      "retry_after": 0,
      "tracing": {
        "trace_id": "b974771ef99489c7"
      }
    },

I see, so this is a problem on Twitter's part then? Would you mind elaborating what that means for gallery-dl and how I would go about dealing with this, if that's possible?