kiwix / overview

:balloon: Start here for current projects, how to get involved with offline projects and joining community calls. A resource for new and veteran members
87 stars 14 forks source link

Provide an AI image upscaler #97

Open JensKorte opened 8 months ago

JensKorte commented 8 months ago

To provide a better image quality it would be nice if there would be an option in the menu line to activate/deactivate AI upscaling. A GPL3 upscaler is available at https://www.upscayl.org . As a ZIP file it used ~300MB. The AI upscaler could provide wrong informations.

Since it needs several seconds on an Intel Core i5 (2016) per image maybe a solution would be to create a caching directory and provide an image srcset link for the available upscaled images. The upscale logo could have 3 modes: off; on, but only partly available for this page and on, all images upscaled available.

A simple solution would be, if the user just needs to reload the page manually. Further steps could be that the partly available upscale logo has a script that loads once every 5 seconds to get a hint, if all images have finished upscaling and shows a ! if all images are available. The next step would be to do a configurable reload when all upscaled images are available.

Rules for reducing the caching dir could be: 1) Is the content bigger then max cache dir size: a) Remove oldest where the zim file is not loaded. b) Remove oldest 2) Is an image older then four weeks/configured time?

(I converted the original d.webp from WP-de to PNG since Github doesn't supportg webp)

https://library.kiwix.org/content/wikipedia_de_all_maxi/I/Germany_in_the_European_Union_on_the_globe_(Europe_centered).svg.png.webp

d d_upscayl_4x_realesrgan-x4plus

JensKorte commented 8 months ago

One ugly example and a good one.

original: https://library.kiwix.org/content/wikipedia_de_all_maxi/I/18.07.2023_-_Reuni%C3%A3o_com_o_Chanceler_da_Rep%C3%BAblica_Federal_da_Alemanha%252C_Olaf_Scholz_-_53055897555_(cropped).jpg.webp olaf2_upscayl_4x_realesrgan-x4plus

original https://library.kiwix.org/content/wikipedia_de_all_maxi/I/2019-09-10_SPD_Regionalkonferenz_Team_Geywitz_Scholz_by_OlafKosinsky_MG_2562.jpg.webp olaf_upscayl_4x_realesrgan-x4plus

Jaifroid commented 7 months ago

It seems like a lot of extra work for the server for small gain which would only be relevant if the user zooms in on a page or image. Even then, what they're seeing in an AI-upscaled image is essentially invented detail, albeit based on best high-probability guess. It wouldn't actually reproduce exact detail lost in the original downscaling process. We could either end up with fake-looking images or else, in the worst case, hallucinated detail. Call me cynical...

kelson42 commented 7 months ago

@JensKorte Thank you for your ticket. It's an interesting one. I'm not sure to fully understand your use case, but if I get it right it would be a software solution to allow better images, dynamically improved, without actually packing high quality images in the ZIM. That would imply to embeds the software solution in Kiwix and considering this is all in typescript.... this is not something very simple... But it could be considered on the longer term. I will move this ticket to kiwix/overview.

Jaifroid commented 4 months ago

@kelson42 My view is that this issue should be closed as not planned. We live in an era where AI is eroding the difference between reality and fabulation, and upscaling low-resolution images, while a neat party trick, can only add invented detail, which in my view corrupts what one is seeing. If we want higher-resolution images, we can always do that by decreasing the amount of downscaling at scrape time, at least for smaller Wikipedia ZIMs, and this would be a much more accurate way of undoing the loss of detail.

kelson42 commented 4 months ago

@kelson42 My view is that this issue should be closed as not planned. We live in an era where AI is eroding the difference between reality and fabulation, and upscaling low-resolution images, while a neat party trick, can only add invented detail, which in my view corrupts what one is seeing. If we want higher-resolution images, we can always do that by decreasing the amount of downscaling at scrape time, at least for smaller Wikipedia ZIMs, and this would be a much more accurate way of undoing the loss of detail.

@Jaifroid Very strong statement. I have no strong opinion on this even if I believe I would not go so far. But, we would definitly need to clearly inform that such a picture has been partly „invented“.

Anyway, at this stage, the problem is mostly technical. We sould need a library able to do so, in native format… to be able to be put in Kiwix… before even considering to use it.

Jaifroid commented 4 months ago

@kelson42 Sorry, I'm just getting increasingly worried about the proliferation of fake imagery. I think a "unique selling point" for Kiwix (at least its major offline Wikipedia role) is that unlike AI, Offline Wikipedia provides content you can rely on not to be contaminated with hallucination, and that includes hallucinated image detail. But of course this is just my opinion. I think it would be worth having some kind of discussion about such things!

kelson42 commented 4 months ago

I think it would be worth having some kind of discussion about such things!

@Jaifroid Definitly, this is worth it. We can even make a policy about it.

JensKorte commented 4 months ago

One way to be sure that the upscaling is alright would be, if there is a reproducable upscaling algorithm and after compressing the image the reproducable upscaling is done, automatically compared to the original hires image and if the comparison is OK the file gets e.g. a XMP comment that upscaling with scaler x and its version, upscaling factor y seems to be OK. A hash of the upscaled image could also be included.

Jaifroid commented 4 months ago

@JensKorte It's an interesting idea. But correct me if I'm wrong: if we're using AI to upscale, then it's non-deterministic, right? The AI will add detail according to its input prompt (in this case an image) and its specific training, and we might get subtly different upscaled images each time. The upscaling done client-side might never be the exact same upscaling done at scrape time in order to store a hash...

I may be extrapolating from the way language models work, as opposed to Stable Diffusion, etc.

I'm also slightly worried about the compute power required. Maybe it's small for upscaling as opposed to making a new image from a text or image prompt.