the-database / mpv-upscale-2x_animejanai

Real-time anime upscaling to 4k in mpv with Real-ESRGAN compact models
Other
287 stars 6 forks source link

V2 models #5

Closed the-database closed 1 year ago

the-database commented 1 year ago

V2 models are being developed to address some feedback provided for the V1 models:

The primary goal of the V2 models is to produce results that appear as if the source was originally produced in 4K while faithfully retaining the original look as much as possible. This will be tracked by downscaling the upscaled results to the native resolution of the anime. The results should be difficult to distinguish from the original anime source. Performing this test on the V1 models more easily reveals the above oversharpening, line darkening, and loss of detail.

It's expected that V2 will not require Soft/Standard/Strong models, there will simply 3 models consisting of V2 Compact, V2 UltraCompact, and V2 SuperUltraCompact.

As with V1, V2 will not be intended for use on low quality sources with heavy artifacting, as those artifacts will be preserved and upscaled. But with less oversharpening, V2 should perform better than V1 on low quality sources.

V2 will undoubtedly lose some sharpness when directly compared to V1. V1 models will remain available for those that prefer the extra sharp look.

Screenshots of progress on the V2 models are included in the following comments. Please note those screenshots are not final and the released V2 model may produce different results.

the-database commented 1 year ago

Comparisons to remove sharpening artifacts to produce a more natural and faithful result, especially in the backgrounds: https://slow.pics/c/lYDpIDUM

the-database commented 1 year ago

Comparisons to remove unwanted line darkening to more closely preserve the original line colors: https://slow.pics/c/MoCMk80U

the-database commented 1 year ago

Comparisons to better preserve grain and detail: https://slow.pics/c/WSHXB5VD

the-database commented 1 year ago

Comparisons of using the model twice for SD content: https://slow.pics/c/52FjK9c4

monarc99 commented 1 year ago

looks really good :)

HeartUnd3rBlade commented 1 year ago

looking forward to these! i've been using ur models every time i watch anime ever since i found them

Razor54672 commented 1 year ago

The V2 models definitely look better in the sense of not taking it too far. Because that line darkening coupled with high sharpness gave it a Family Guy sort of look imo.

Update : When are you releasing them btw?

Anon1337Elite commented 1 year ago

I actually like the sharper ones better, not the strong one but the standard one seems in a good spot. Hopefully u have a sharper model for v2 also. Look great on TVs

the-database commented 1 year ago

I actually like the sharper ones better, not the strong one but the standard one seems in a good spot. Hopefully u have a sharper model for v2 also. Look great on TVs

Thanks for the feedback. It's nice to see that opinion of V1 on TVs because I thought the same thing when I designed V1 - I think it's a good model when viewing from a distance. V2 has different design goals from V1 which are described in detail in the first post of this issue, but in short, one goal is to reduce/eliminate sharpening artifacts as much as possible. So I'm not sure if V2 will have a sharper version since that's difficult to do without introducing more sharpening artifacts, but I'll keep in mind if there's demand for this. Currently my focus is on a balanced V2 that looks good no matter how far you zoom into the image, but once that's complete, if there's demand for a sharper model I may be able to quickly train and release a "Strong" version of V2. On the other hand, maybe V1 already fulfills the same role a Strong V2 would fulfill. I do plan to keep V1 models available for those that prefer the extra sharpness.

Update : When are you releasing them btw?

The V2 models will be released as soon as they're completed. I have been working on them almost non-stop for the past two months or so, and I'm getting closer and closer to achieving all of my design goals. I have two items left that I've been working to address recently - one is better handling of scanlines, grid lines and other patterns that occur in some anime, and the other is further improving the very slight oversharpening artifacts that still exist in the latest models in specific scenarios. I believe I've just completed the first item of handling scanlines, so I'll be moving onto the last item shortly.

I prefer not to target any specific release dates since there's no guarantee that they will be met, but at the current rate of progress I think V2 should be ready for release before the end of April.

I'll post new samples of the latest progress later today.

the-database commented 1 year ago

Here are some of the newest samples: https://slow.pics/c/unTNPNX1

L4cache commented 1 year ago

May I request for 4x models?

the-database commented 1 year ago

May I request for 4x models?

I'm not sure yet but I may attempt this later. Currently you can achieve 4x by running 2x models twice, but maybe not with the best quality or performance compared to a dedicated 4x model.

I'll need a break from training models after V2 is released but will eventually investigate whether 4x models would be worth training or not.

the-database commented 1 year ago

Some comparisons of Compact vs UltraCompact vs SuperUltraCompact: https://slow.pics/c/ANUi6Xi3

The biggest differences are in the details in the background. The line sharpening on all of the models are pretty similar.

The V2 models are close to done, but they'll need a little more time before they're ready for release. In short, I'm targeting a release in the next two weeks or so.

For anyone interested in more details on the training process:

The good news is that I did find the last breakthrough that I spent nearly a month looking for which will improve the quality of the models. The consequence is that a lot of training needs to be redone because of where the issue was found.

My current method for training these models is to first train a much larger model, and I used the SwinIR architecture for this model. The SwinIR model is much slower to run, so it can't be used for realtime playback of videos. But the SwinIR model is effective in helping to quickly training the Compact model. The SwinIR model could be considered a teacher for the Compact model. After I completed training the SwinIR model, I used it to explore several paths in training the Compact model, and recently narrowed the results down to the best Compact models I was able to train using the SwinIR model. But they still produced flawed results in some images, where they would introduce artifacts around line art, especially for more colorful lines. I spent several days trying to track down where this issue was coming from, until I realized the issue was not in any of the training parameters - the SwinIR model itself had this issue. The top image is an example of the SwinIR model handling colorful lines poorly:

1__untouched__hurrdeblur_ HorribleSubs  Miira no Kaikata - 01  1080p _0024 2__untouched__hurrdeblur_ HorribleSubs  Miira no Kaikata - 01  1080p _0024

So I'm currently training a new SwinIR GAN model which is showing promising results. This model has around 4 days of training left, but it might stop improving and finish early. Then I need to use the new model to prepare a new dataset to train Compact which should take a day. Training Compact should take another day or two. Once Compact is trained, it can be used to prepare a dataset for UltraCompact and SuperUltraCompact which should also take another day or two for both, as I can train them in parallel. Since I spent the past month optimizing the training parameters for Compact, UltraCompact, and SuperUltraCompact, I suspect I will be able to reuse them and still get the best results without spending another month trying to optimize them.

If any issue arises at any step, then work will need to be redone and all of the following steps will need to be repeated. That is basically the process I've been following during the development of V2, and that's why it's so difficult to accurately estimate any release date. Going through that process, I have probably trained over 100 V2 candidate models at this point, in search of the best quality model with the least amount of issues. I think the end result will be worth it and I'm excited to release it as soon as it's ready.

the-database commented 1 year ago

Some new samples: https://slow.pics/c/NGEnUdrt

V2 is very close to ready. The SwinIR GAN path I mentioned previously was a dead end. I spent several days and eventually found another solution to that issue, but then I found a new issue - small details in backgrounds were being oversharpened and having too much noise added to them. I spent few days on that and I'm testing a promising fix to that now. If it goes well, the Compact model will be complete and I'll be able to use it to train UltraCompact and SuperUltraCompact.

etetdev commented 1 year ago

I'm really looking forward for the V2 ! 👍

Just letting you know that the FZ screen looks weird IMO. Some lines completely disappeared, It's like way oversharped.

The others screens looks amazing though.

You're doing an amazing work nonetheless ! 🫡

Galahahad commented 1 year ago

Thanks for your amazing job :)

octopushugger commented 1 year ago

Might you release a v2 beta for interested parties? It's already looking better than v1.

No pressure though if you don't wanna let an unfinished product out in the wild.

the-database commented 1 year ago

Might you release a v2 beta for interested parties? It's already looking better than v1.

No pressure though if you don't wanna let an unfinished product out in the wild.

While I'd prefer to just release a completed v2 once everything is done, I'm not against doing a prerelease if the wait for the release drags on for too long. Although any prerelease would just be a compact model since that's where all of the work is being done first. I wonder how much interest there is in the compact model compared to the ultracompact and superultracompact?

octopushugger commented 1 year ago

There is certainly more interest for ultra/superultra but regular compact would still be great for non-real-time applications.

the-database commented 1 year ago

Sounds good. Mid level cards will also be able to use compact for SD resolutions, and I'm finding that compact works well on some 1080p anime when downscaled to resolutions such as 900p, which would allow more cards to run the compact model on HD anime. Compact running on 900p downscales might even produce better quality than UltraCompact on 1080p depending on the anime, but that will need some more exploration. I'll rely on the community to experiment with that when all of the models are released.

With the current rate of progress, I'm comfortable setting a new target release date for the end of June. I hope to release a complete package of V2 consisting of the new models along with a prepackaged release archive of mpv configured with everything needed to run out of the box. But if everything isn't ready by the end of June, I'll at least post the latest release candidate of the compact model here at that time.

I'll post some new sample images later today.

the-database commented 1 year ago

New samples which show off what I've been focusing on most recently - backgrounds that are clear, sharp, and detailed, but still soft when they should be. As always, the goal is a natural and faithful look.

https://slow.pics/c/NCVa3anZ

dvize commented 1 year ago

That looks great and no hard outline on the edges (thats found in cel shaded games). I keep checking back every day :)

Lycoris2013 commented 1 year ago

As you say, animation has a production resolution. So you are right to downscale and then upconvert. Here is a little old Japanese site that shows the same thing. You will get the maximum effect if you downscale and then upconvert using this site as a reference. http://anibin.blogspot.com/ https://anibin.blogspot.com/2017/10/blog-post_9.html

HeartUnd3rBlade commented 1 year ago

In absolute awe at the new samples. I made a comparison with your current model i'm using (2x_AnimeJaNai_Standard_V1_UltraCompact_net_g_100000) on a 3080, granted, I resized your pic for viewing purposes to be in line with my 2560x1440 monitor. Perfect choice to show HnK. From my experience the characters tend to get blurred in the smoothing presumably because of its pastel color palette along with the watercolor-ish backgrounds getting too sharpened. Outstanding improvements & professional progress in V2! (or 1.138 for now), you clearly understand and made everything better that came to my mind. Background isn't blown up with sharpening that takes your attention & Cinnabar is the focus in hi-def. Can't wait! https://slow.pics/c/Lx342FNX

the-database commented 1 year ago

As you say, animation has a production resolution. So you are right to downscale and then upconvert. Here is a little old Japanese site that shows the same thing. You will get the maximum effect if you downscale and then upconvert using this site as a reference. http://anibin.blogspot.com/ https://anibin.blogspot.com/2017/10/blog-post_9.html

anibin is a great resource, I made extensive use of it for the training of these models (along with getnative for anime that weren't listed on anibin). As a result these models are "native-res aware" without any downscaling, and work just as well on blurry native 720p sources and sharper native 1080p sources, and everything in between.

I would suggest only using downscaling in setups that need the performance boost.

the-database commented 1 year ago

In absolute awe at the new samples. I made a comparison with your current model i'm using (2x_AnimeJaNai_Standard_V1_UltraCompact_net_g_100000) on a 3080, granted, I resized your pic for viewing purposes to be in line with my 2560x1440 monitor. Perfect choice to show HnK. From my experience the characters tend to get blurred in the smoothing presumably because of its pastel color palette along with the watercolor-ish backgrounds getting too sharpened. Outstanding improvements & professional progress in V2! (or 1.138 for now), you clearly understand and made everything better that came to my mind. Background isn't blown up with sharpening that takes your attention & Cinnabar is the focus in hi-def. Can't wait! https://slow.pics/c/Lx342FNX

Thanks! I can't wait to release these models myself. It has been a long road but I promise the wait will be worth it.

Lycoris2013 commented 1 year ago

As you say, animation has a production resolution. So you are right to downscale and then upconvert. Here is a little old Japanese site that shows the same thing. You will get the maximum effect if you downscale and then upconvert using this site as a reference. http://anibin.blogspot.com/ https://anibin.blogspot.com/2017/10/blog-post_9.html

anibin is a great resource, I made extensive use of it for the training of these models (along with getnative for anime that weren't listed on anibin). As a result these models are "native-res aware" without any downscaling, and work just as well on blurry native 720p sources and sharper native 1080p sources, and everything in between.

I would suggest only using downscaling in setups that need the performance boost.

I didn't think you knew about anibin... The idea of upconverting without downscaling is very cool. I have even higher expectations for your V2!

Are you also learning BD AIR, which is famous for low quality upconversion? When I encoded it before, I had to downscale it to 540p once to get the terrible jaggies. I believe that cel animated SD and digital animated SD will require completely different methods.

the-database commented 1 year ago

I didn't think you knew about anibin... The idea of upconverting without downscaling is very cool. I have even higher expectations for your V2!

Are you also learning BD AIR, which is famous for low quality upconversion? When I encoded it before, I had to downscale it to 540p once to get the terrible jaggies. I believe that cel animated SD and digital animated SD will require completely different methods.

I was not familiar with AIR, but the model has learned native resolutions as low as 540p from frames such as End of Evangelion. It looks like it works OK on AIR.

Of course there will be limits to what the model can handle. Shows like Megalobox (native 405p) do need some downscaling for best results. Haibane Renmei (360p?) also benefits greatly from downscaling. While the model could have been trained to also handle sources this blurry, I thought the quality of higher native res shows would suffer as a result, and cause oversharpening on those sources, so I drew a line at 540p.

The V2 build of mpv will allow configuring multiple slots so different levels of downscaling can be placed in several slots. That should make it easy to just choose the right slot for a show after comparing them in mpv, so there shouldn't be any need for users to look up exact native resolutions depending on the anime they're watching.

Lycoris2013 commented 1 year ago

I didn't think you knew about anibin... The idea of upconverting without downscaling is very cool. I have even higher expectations for your V2! Are you also learning BD AIR, which is famous for low quality upconversion? When I encoded it before, I had to downscale it to 540p once to get the terrible jaggies. I believe that cel animated SD and digital animated SD will require completely different methods.

I was not familiar with AIR, but the model has learned native resolutions as low as 540p from frames such as End of Evangelion. It looks like it works OK on AIR.

Of course there will be limits to what the model can handle. Shows like Megalobox (native 405p) do need some downscaling for best results. Haibane Renmei (360p?) also benefits greatly from downscaling. While the model could have been trained to also handle sources this blurry, I thought the quality of higher native res shows would suffer as a result, and cause oversharpening on those sources, so I drew a line at 540p.

The V2 build of mpv will allow configuring multiple slots so different levels of downscaling can be placed in several slots. That should make it easy to just choose the right slot for a show after comparing them in mpv, so there shouldn't be any need for users to look up exact native resolutions depending on the anime they're watching.

Are these comparisons 1080p downscaled to 360p and then upscaled to 2160p vs 1080p downscaled to 540p and processed in V2? I am unable to produce your pre-comparison image in any way. Maybe some kind of contour smoothing filter is used.

The original BD is this image. If this terrible jaggies can be converted to your post conversion image it is a revolution. AIR.zip

Now that I try it again, I think it is better to downscale to 360p and then upscale as Haibane Renmei did. If, as you say, the rest is sacrificed, then 540p will still provide a certain level of quality. The following file is filtered with waifu2x. AIR_waifu2x_filterd.zip

the-database commented 1 year ago

The original BD is this image. If this terrible jaggies can be converted to your post conversion image it is a revolution. AIR.zip

I see, it looks like the source I was testing was already filtered and corrected. On your original image the V2 models aren't able to help much, so downscaling is probably necessary.

Lycoris2013 commented 1 year ago

The original BD is this image. If this terrible jaggies can be converted to your post conversion image it is a revolution. AIR.zip

I see, it looks like the source I was testing was already filtered and corrected. On your original image the V2 models aren't able to help much, so downscaling is probably necessary.

I understand. Thanks for the V2 release.

A few artifacts will occur, but downscaling to 540p has improved the jaggies. This is cool. Best upconverter currently available. Comparison.zip