Real-Ergan Model - Githubissues

manuelernestog commented 10 months ago

Hi Kevin, First of all, congratulations on the work you have done with Upscaler.

The models work great, but the only scenario where you have some problems is with the transparent PNGs because the clipping does not persist and a black background is displayed around it.. A few days ago I was checking and found this Real-esrgan model that can process them. I wonder if it would be possible to integrate it into Upscaler.

I was reviewing some of the projects developed with it and there is one made with Electron but looking at the model code it has a different extension than the one used in Upscaler.

I leave you here the link to the repository with the models. Regards

Model: https://github.com/xinntao/Real-ESRGAN/

Electron GUI: https://github.com/upscayl/upscayl

thekevinscott commented 10 months ago

I'd be happy to look into this - will report back soon.

thekevinscott commented 6 months ago

I spent some time looking into this.

The challenge with this model (and frankly, most modern models) is that it's built on top of PyTorch, and UpscalerJS sits on top of Tensorflow.js. Translating models between PyTorch -> Tensorflow is non trivial.

Real-ESRGAN in particular is built on top of an additional Python library, basicsr, which I presume would also need to be translated into Javascript.

So, as a short answer, no, I don't think it will be feasible to convert Real-ESRGAN.

(To be clear, I think it is probably technically feasible, but is more work than I realistically have time for; as a comparison, I spent about 6 months converting the DDNM family of models from PyTorch to TFJS, before I was able to get a working prototype.)

I realize that's a crappy answer. Some more thoughts:

I actually raised a post to the Tensorflow team a year ago about the difficulties of leveraging Pytorch models in TFJS, which spawned a healthy discussion but no actionable steps.

Since UpscalerJS was released in 2020 there's been a number of new developments:

The research community has basically settled on PyTorch (with maybe some JAX, for Google-centric research)
Super resolution as a subdomain, like a lot of areas of ML, is blurring with other domains. For instance, is Stable Diffusion "super resolution"? What about something like Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization where they're using Stable Diffusion to help guide the super resolution for better clarity? Not all these emerging domains are well supported in Tensorflow.js.
There are other compelling alternatives to Tensorflow.js, such as Transformers.js, MLC-AI (which I think is using TVM), Candle and Burn which are written in Rust and target WASM, that didn't exist back in 2020. Transformers.js in particular sits on top of ONNX, which I found quite immature back in 2020 but now can boast of pretty seamless support for PyTorch models.

It's frustrating to see the pace of release of new research and not be able to leverage those models in this library. I'd love for things to be as simple as pointing this library at a Hugging Face model, converting it automatically, and opening up this library to the full ecosystem. I've started researching ways we might enable that, though I don't have anything tangible to say right now.

thekevinscott commented 6 months ago

To a specific comment from your post:

the only scenario where you have some problems is with the transparent PNGs because the clipping does not persist and a black background is displayed around it.

We have had questions about alpha transparency before. Alpha transparency could be supported in the current models, but they would need to be trained from scratch on top of 4-channel pngs (with a diverse range of alpha support).

(I just realized I've never posted my ESRGAN training script publicly. It's a bit messy, but if it's something you're interested in I'd be happy to share it. I believe I trained on a 3090 and each model took a couple days to train.)

I think the biggest challenge is finding a solid dataset to leverage. The ESRGAN models are trained on DIV2k, so ideally you'd want to find an alpha-transparent dataset that is similar in size and diversity to that.

thekevinscott / UpscalerJS

Real-Ergan Model #1273