mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.89k stars 976 forks source link

[e621] Downloading stops after first page #5798

Closed tnovak007 closed 4 months ago

tnovak007 commented 4 months ago

Hi!

Probably something is wrong with e621. https://e621.net/posts?tags=equid+-my_little_pony downloads only 319 images and then stops without error.

<redacted>gallery-dl --ignore-config -c <redacted>.json "https://e621.net/posts?tags=equid+-my_little_pony" -d "<redacted>\\E621\\" -v
[gallery-dl][debug] Version 1.27.0-dev
[gallery-dl][debug] Python 3.12.3 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.31.0 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files ['<redacted>.json']
[gallery-dl][debug] Starting DownloadJob for 'https://e621.net/posts?tags=equid+-my_little_pony'
[root][debug] Using E621TagExtractor for 'https://e621.net/posts?tags=equid+-my_little_pony'
[root][debug] Loading cookies from '<redacted>.txt'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): e621.net:443
[urllib3.connectionpool][debug] https://e621.net:443 "GET /posts.json?tags=equid+-my_little_pony&limit=320 HTTP/1.1" 200 None
[root][debug] Using download archive '<redacted>\gallery-dl\archives\e621.sqlite3'
[root][debug] Active postprocessor modules: [MetadataPP]

I think the issue is that https://e621.net/posts.json?tags=equid+-my_little_pony&limit=320 returns only 319 items...

Strangely https://e621.net/posts?tags=equid+my_little_pony is OK...

Thank you.

mikf commented 4 months ago

See extractor.e621.threshold The current default (320) should probably be lowered (to 300? 160? 1?)

tnovak007 commented 4 months ago

Hi Thanks for your suggestion. I tried to lower it, but even when I lowered it to 70 it still returns only 69 items and stops prematurely. 50 is the first working value.

I think it's a e621 bug, because it always (until 50) gives number of items precisely 1 less than the threshold. It seems some bugged picture causes the json response to skip it and if the pagination logic is to stop if number of pictures returned is less than the threshold (line 146 in danbooru.py I think) it then initiates the bug.

I think the proof is that the second example I wrote works fine with the default 320 value.

Do you think it's possible to change the pagination logic to check the real end of results (last page) in other way than getting less pictures in response than the threshold value? Maybe something like: If threshold value in config is "auto" or empty try next page and if it returns 0 results then stop (else continue)... Something like DeviantArt on "manual" has (checking "next_cursor" data).

Thank you.

mikf commented 4 months ago

This problem is caused by how e621 and co. handle negative tag searches, I think.

Maybe something like: If threshold value in config is "auto" or empty try next page and if it returns 0 results then stop (else continue)

You can set threshold to 0 to get this behavior.

tnovak007 commented 4 months ago

Hi!

You can set threshold to 0 to get this behavior.

Oh, thank you, I didn't know that. It really works! Problem solved! I didn't try this because in docs it's stated that "The value cannot be less than 1." (And I didn't check the code more.)

This problem is caused by how e621 and co. handle negative tag searches, I think.

I don't think so, because this search (and others with negative tag search) worked a month ago and before. It's probably some particular image bug. But this workaround works so not a problem anymore for me!

Thank you again for your support and this great tool!