mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.38k stars 930 forks source link

[NSFW] Sankaku Complex image booru's new IDs #5073

Open arisboch opened 8 months ago

arisboch commented 8 months ago

Since a few days or weeks or so, the Sankaku image booru changed their IDs from purely numeric to alphanumeric (among others apparent changes to their API). Downloads now sometimes fail with various errors or do succeed, but with nonsensical IDs. Here are a few examples:

[gallery-dl][debug] Version 1.26.7-dev
[gallery-dl][debug] Python 3.11.6 - Linux-6.5.0-14-generic-x86_64-with-glibc2.38
[gallery-dl][debug] requests 2.31.0 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files ['${HOME}/.gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'https://sankaku.app/posts/vkr3g3gN2rZ'
[gallery-dl][error] Unsupported URL 'https://sankaku.app/posts/vkr3g3gN2rZ'

An attempt to download posts with containing more... extreme content throws the following error message, even though I do have a premium account and could view these kinda posts in the browser without problems:

[gallery-dl][debug] Version 1.26.7-dev
[gallery-dl][debug] Python 3.11.6 - Linux-6.5.0-14-generic-x86_64-with-glibc2.38
[gallery-dl][debug] requests 2.31.0 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files ['${HOME}/.gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'https://sankaku.app/posts/[reacted]'
[sankaku][debug] Using SankakuPostExtractor for 'https://sankaku.app/posts/[reacted]'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): capi-v2.sankakucomplex.com:443
[urllib3.connectionpool][debug] https://capi-v2.sankakucomplex.com:443 "GET /posts?lang=en&page=1&limit=1&tags=id_range%3A9 HTTP/1.1" 404 105
[sankaku][error] snackbar__content-belongs-to-premium-client

Another post with extreme content throws the following error:

[gallery-dl][debug] Version 1.26.7-dev
[gallery-dl][debug] Python 3.11.6 - Linux-6.5.0-14-generic-x86_64-with-glibc2.38
[gallery-dl][debug] requests 2.31.0 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files ['${HOME}/.gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'https://sankaku.app/posts/[redacted]'
[sankaku][debug] Using SankakuPostExtractor for 'https://sankaku.app/posts/[redacted]'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): capi-v2.sankakucomplex.com:443
[urllib3.connectionpool][debug] https://capi-v2.sankakucomplex.com:443 "GET /posts?lang=en&page=1&limit=1&tags=id_range%3Ae HTTP/1.1" 408 157
[sankaku][error] snackbar__account_search-invalid-structure
mikf commented 8 months ago

Posts with alphanumeric IDs are now supported (https://github.com/mikf/gallery-dl/commit/a416d4c3d5b5b315342d19514085006c47bcc323). This might also fix the other [sankaku][error] snackbar__… errors, not sure though.

arisboch commented 8 months ago

Thanks a bunch, that did the trick.

arisboch commented 8 months ago

Could you also please add a directory and file template field for the new post IDs?

mikf commented 8 months ago

I can't figure out how to do this in general. I would have thought and hoped that these new IDs can somehow be gotten by converting an existing value (numeric ID, MD5) to another format/base, but that doesn't seem to work.

New IDs are all 11 characters long and in base 62 (ASCII letters and digits) from what I can tell.

We could obviously just take the given ID for post URLs, but it would be missing for everything else like tag searches.

mikf commented 7 months ago

So I figured out how to get alphanumeric IDs (https://github.com/mikf/gallery-dl/commit/34a4ddc3996b19cacea8c8c88a432ec9104a068b, "id-format": "alnum"), but you can only have either new-style IDs or the old numeric IDs. Both at the same time would theoretically be possible by repeating each API call, once with Api-Version: 2 header and once without.

AlttiRi commented 6 months ago

https://forum.sankakucomplex.com/t/converting-between-legacy-numeric-new-alphanumeric-post-ids/48568

What is the algorithm or formula to convert from the old numeric post ID format to the new alphanumeric ID format?

The admin wrote:

if the algorithm was public, there wouldn’t be much point in the change, no? they use a similar hashing method to a service like youtube. the hash key is just obviously the same between Chan and Idol

mo-han commented 3 months ago

@mikf id-format=alnum seem not working for the idol site, maybe it drop legacy id.

since the id is not int now, i suppose we can't use op like > < == with id in filter anymore, right?

mikf commented 3 months ago

id-format only works on chan.sankakucomplex / sankaku.app, but it never did on idol.sankakucomplex. The idol site had both id and id_alnum for as long as that was possible, but it removed numeric IDs completely some time ago (32262a048ba4fb0bcc5f21b93830228c81209e96).

It is possible to compare str with > < ==, but this is kind of pointless here since IDs are no longer in any predictable order. Comparing date values might be an alternative.