mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.36k stars 925 forks source link

Pixiv's ugoira zip files are actually samples #6056

Open hdk5 opened 3 weeks ago

hdk5 commented 3 weeks ago

The original (or at least less badly compressed) frames can be obtained from https://i.pximg.net/img-original/img/20xx/xx/xx/xx/xx/xx/{id}_ugoira{n}.{ext} urls, where n=[0; number-of-frames). The number of frames should be grabbed from ugoira metadata api.

For example: https://www.pixiv.net/artworks/101003492 101003492_ugoira1920x1080.zip: JPEG q=90, downscaled to 1080x1080, no transparency, visible artifacts inside/around "pixels" 101003492_ugoira0.png: PNG lossless, 1210x1210, transparent background, no artifacts image image

This applies not only to images with dimensions above 1920x1080 and/or jpg ; another example: https://www.pixiv.net/artworks/116464967 116464967_ugoira1920x1080.zip: JPEG q=90, same artifacts 116464967_ugoira0.jpg: JPEG q=99, feels lossless, no artifacts on pixel art image image

I used low-palette pixel art works as examples since it is easier to show the difference. In reality, most artists usually just convert their animated works to gif and then pixiv frame-by-frame converts it to jpg, so there's still unavoidable re-encoding involved.

See https://github.com/danbooru/danbooru/pull/5793 at danbooru for any additional details.

mikf commented 1 week ago

Downloading "original" frames is now possible by setting ugoira to "original" / -o ugoira=original (https://github.com/mikf/gallery-dl/commit/9d1e5f3c9bfb933c0075dc5b050c0b90137a9f59).

Converting them to animated formats should work as well, but there might be a few bugs for edge cases and --ugoira currently overrides -o ugoira=original (https://github.com/mikf/gallery-dl/commit/57da9ebfb5d53b3e1f729161a199d6f825a72a94).


Is there a better way to get an "original" frame's filename extension other than just guessing?

https://github.com/mikf/gallery-dl/blob/57da9ebfb5d53b3e1f729161a199d6f825a72a94/gallery_dl/extractor/pixiv.py#L107-L113

hdk5 commented 1 week ago

Is there a better way to get an "original" frame's filename extension other than just guessing?

It is always the same as the first frame's, as pixiv won't let one upload frames of different filetypes.

Can't say about the api that gallery-dl uses, but on ajax/illust it is in body.urls.original field, e.g.:

$ curl -s https://www.pixiv.net/ajax/illust/101003492 | jq ".body.urls.original"
"https://i.pximg.net/img-original/img/2022/09/04/23/54/19/101003492_ugoira0.png"