mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.76k stars 963 forks source link

Issue with Danbooru #1004

Closed ghost closed 4 years ago

ghost commented 4 years ago

So I was trying to get some stuff (well, more like allot, i suppose) from Danbooru, but as it was running, I stumbled into this error:

[danbooru][error] An unexpected error occurred: KeyError - 'id'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues . [danbooru][debug] Traceback (most recent call last): File "c:\python38\lib\site-packages\gallery_dl\job.py", line 67, in run for msg in self.extractor: File "c:\python38\lib\site-packages\gallery_dl\extractor\danbooru.py", line 52, in items for post in self.posts(): File "c:\python38\lib\site-packages\gallery_dl\extractor\danbooru.py", line 96, in _pagination params["page"] = "b{}".format(posts[-1]["id"]) KeyError: 'id'

This is the command I was trying to use: gallery-dl "https://danbooru.donmai.us/posts?tags=azur_lane"

Is it possible for this to be looked into, and fixed?

kattjevfel commented 4 years ago

Please provide the exact entry that caused this error, and do as the command prompts you, run it with --verbose. I doubt anyone is willing to download potentially literally 56k azur lane images.

That said, I'm currently testing this (gallery-dl --ignore-config --verbose) and at ~400 images I've yet to encounter an error.

ghost commented 4 years ago

The images are being downloaded to be used in a dataset for training an AI model for upscaling anime styled images.

Anyhow, running with "gallery-dl "https://danbooru.donmai.us/posts?tags=azur_lane" --verbose", it stops here:

"# .\gallery-dl\danbooru\azur_lane\danbooru_3818910_9d109b6197ea8b8316370da4d0b7fe9b.png"

And here's the error using the verbose command (I did originally use the verbose command, when I made the post, but i forgot to mention so. oops.):

"[danbooru][error] An unexpected error occurred: KeyError - 'id'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues . [danbooru][debug] Traceback (most recent call last): File "c:\python38\lib\site-packages\gallery_dl\job.py", line 67, in run for msg in self.extractor: File "c:\python38\lib\site-packages\gallery_dl\extractor\danbooru.py", line 52, in items for post in self.posts(): File "c:\python38\lib\site-packages\gallery_dl\extractor\danbooru.py", line 96, in _pagination params["page"] = "b{}".format(posts[-1]["id"]) KeyError: 'id'"

kattjevfel commented 4 years ago

The purpose of the --verbose flag is to print the exact python version and OS, it shows at the top of the command.

Anyway, the # means it was downloaded, so the error is for the next one in line. I guess the output just isn't clear enough o n this, and testing this myself (first with --simulate, then with --no-download) ~~I stop at two different entries (first time was the same one you stopped at) I tried doing --range to make it skip to roughly to an erroring area, but it really doesn't seem to want to work with this.~~

I found the exact entry, but I can't figure out how to the proper ID from it. gallery-dl --simulate --ignore-config --verbose https://danbooru.donmai.us/posts\?tags\=azur_lane --range 9997 is reproducible though and provides the same error.

[gallery-dl][debug] Version 1.15.0-dev
[gallery-dl][debug] Python 3.8.5 - Linux-5.8.9-zen2-1-zen-x86_64-with-glibc2.2.5
[gallery-dl][debug] requests 2.24.0 - urllib3 1.25.10
[gallery-dl][debug] Starting SimulationJob for 'https://danbooru.donmai.us/posts?tags=azur_lane'
[danbooru][debug] Using DanbooruTagExtractor for 'https://danbooru.donmai.us/posts?tags=azur_lane'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): danbooru.donmai.us:443
[urllib3.connectionpool][debug] https://danbooru.donmai.us:443 "GET /posts.json?tags=azur_lane&limit=200&page=50 HTTP/1.1" 200 None
[danbooru][error] An unexpected error occurred: KeyError - 'id'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[danbooru][debug] 
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/gallery_dl/job.py", line 67, in run
    for msg in self.extractor:
  File "/usr/lib/python3.8/site-packages/gallery_dl/extractor/danbooru.py", line 52, in items
    for post in self.posts():
  File "/usr/lib/python3.8/site-packages/gallery_dl/extractor/danbooru.py", line 96, in _pagination
    params["page"] = "b{}".format(posts[-1]["id"])
KeyError: 'id'

EDIT: Looking at https://danbooru.donmai.us/posts.json?tags=azur_lane&limit=200&page=50 and looking at the last entry shows that it lacks an ID and has "is_banned": true.

  {
    "created_at": "2020-05-02T13:09:59.253-04:00",
    "uploader_id": 351692,
    "score": 42,
    "source": "https://twitter.com/ksoo420/status/1256450801184018434",
    "last_comment_bumped_at": null,
    "rating": "e",
    "image_width": 1500,
    "image_height": 1059,
    "tag_string": "1boy 1girl antenna_hair azur_lane bad_id bad_twitter_id bangs banned_artist black_choker black_panties black_ribbon blush breasts brown_eyes censored choker cleavage clothed_sex collarbone cowgirl_position cum cum_in_pussy dutch_angle eyebrows_visible_through_hair girl_on_top hair_between_eyes hair_ribbon heavy_breathing hetero iron_cross jewelry lactation large_breasts long_hair long_sleeves looking_at_viewer mole mole_on_breast mosaic_censoring multicolored_hair navel necklace nipple_tweak nipples open_clothes open_mouth open_shirt panties panties_aside penis prinz_eugen_(azur_lane) pussy red_hair ribbon sex shirt silver_hair slit_pupils solo_focus spread_legs squatting_cowgirl_position straddling streaked_hair suru_(ksoo420) sweat swept_bangs two_side_up underwear vaginal very_long_hair white_shirt",
    "is_note_locked": false,
    "fav_count": 87,
    "last_noted_at": null,
    "is_rating_locked": false,
    "parent_id": 3892255,
    "has_children": false,
    "approver_id": null,
    "tag_count_general": 65,
    "tag_count_artist": 2,
    "tag_count_character": 1,
    "tag_count_copyright": 1,
    "file_size": 394962,
    "is_status_locked": false,
    "pool_string": "",
    "up_score": 42,
    "down_score": 0,
    "is_pending": false,
    "is_flagged": false,
    "is_deleted": false,
    "tag_count": 71,
    "updated_at": "2020-08-30T03:56:51.044-04:00",
    "is_banned": true,
    "pixiv_id": null,
    "last_commented_at": null,
    "has_active_children": false,
    "bit_flags": 2,
    "tag_count_meta": 2,
    "has_large": true,
    "has_visible_children": false,
    "is_favorited": false,
    "tag_string_general": "1boy 1girl antenna_hair bangs black_choker black_panties black_ribbon blush breasts brown_eyes censored choker cleavage clothed_sex collarbone cowgirl_position cum cum_in_pussy dutch_angle eyebrows_visible_through_hair girl_on_top hair_between_eyes hair_ribbon heavy_breathing hetero iron_cross jewelry lactation large_breasts long_hair long_sleeves looking_at_viewer mole mole_on_breast mosaic_censoring multicolored_hair navel necklace nipple_tweak nipples open_clothes open_mouth open_shirt panties panties_aside penis pussy red_hair ribbon sex shirt silver_hair slit_pupils solo_focus spread_legs squatting_cowgirl_position straddling streaked_hair sweat swept_bangs two_side_up underwear vaginal very_long_hair white_shirt",
    "tag_string_character": "prinz_eugen_(azur_lane)",
    "tag_string_copyright": "azur_lane",
    "tag_string_artist": "banned_artist suru_(ksoo420)",
    "tag_string_meta": "bad_id bad_twitter_id"
  }
ghost commented 4 years ago

Interesting. Is there a parameter to have it ignore things like banned posts, and to just keep going with the rest of the files?

mikf commented 4 years ago

This error only occurred when the last post in a batch of 200 didn't have an id field because it was deleted etc.