imxieyi / waifu2x-ios

iOS Core ML implementation of waifu2x
MIT License
538 stars 58 forks source link

Broken Output Error #48

Closed mattbisme closed 2 years ago

mattbisme commented 2 years ago

While processing an HEVC video file on an M1 Max with the following configuration:

Basic Mode Real-CUGAN - Anime 2x Upscale Conservative Variant 1.0 Intensity MP4 Output format Downscale OFF Keep original

Max output: 1276x956 RAM Usage: 471.5/32GB

The following error occurred after getting through most of the encode: Internal error: broken output from Core ML se_mean3… …the message is truncated.

The video was only about 251 seconds. Let me know if you need any additional information.

Thanks, Matt

imxieyi commented 2 years ago

Did your device enter sleep mode? This type of error usually happens on iOS when the app enters background. It's a transient error that should not happen if you try again since the possibility of happening is extremely low.

One possible improvement is to retry after a short delay if such error happens. I will consider doing this in the next version.

mattbisme commented 2 years ago

Good question! Just for clarity, I am on macOS. It had made it more than halfway through the encode before I stopped monitoring it, but I did check Activity Monitor; the waifu2x "Prevent Sleep" status was set to "Yes."

So, I wouldn't have expected it to fall asleep in the middle of an encode… but I wasn't actually in front of the computer when it would have finished. I normally leave encodes going overnight.

One possible improvement is to retry after a short delay if such error happens.

If it's possible to "resume" the encode from a failure point… this could save a lot of time in the future. As fast as the encodes are (considering what they do), it can still be a lot of time lost to have to start over. Maybe not only an automatic retry, but a button for the user as well.

imxieyi commented 2 years ago

It would be very difficult to support resuming from a failure point which most people will probably never use. If you really need this feature, I'd suggest that you try using command line with FFmpeg. You can easily start from a specific point (example). When the app fails because of an error you can start again from the failure point and concatenate output videos together.

mattbisme commented 2 years ago

I have done some additional encodes. This time, I ran some software to keep the computer awake until I explicitly said otherwise (only the displays were put to sleep, not the computer). This time, it was a batch encode of six videos using ANE + GPU. The first encode succeeded, but the second encode (running at the same time) failed with the same se_mean3 error.

Both encodes are similar in length ~394 seconds. Input (640x480) and output (1280x960) resolutions exactly the same (they are actually both clips from the same original source). All queued encodes also had errors, but I forgot to make note of what they were before clearing them… perhaps it was related to the second video failing that stopped the rest?

Any guidance on how to avoid this error would be much appreciated.

imxieyi commented 2 years ago

A Real-CUGAN model is actually 5 Core ML models running sequentially. The failure rate of a single Core ML prediction is low, but it adds up very quickly with Real-CUGAN models. TBH I have no idea about what triggered such errors. I have implemented retry mechanism for this in the next update, which is currently under review. Hope it will help but I have almost no confidence about this.

mattbisme commented 2 years ago

Do you have a TestFlight app available? I would definitely help test those low-confidence features. Depending on the feature, it might not be something you want to hit a wider audience right away.

imxieyi commented 2 years ago

TestFlight also requires reviews, which is not worth doing since regular reviews are already fast enough for macOS apps. This fix is not harmful so I'd rather go through a new release.

imxieyi commented 2 years ago

The latest macOS version (6.2.1) implemented the potential fix. Feel free to update and report if it doesn't work at all.

mattbisme commented 2 years ago

I will do that! Also, I just did another batch last night; didn't even let the monitors sleep this time. No error, but the next two jobs seem to be stuck on "Generating" as a status. I'm not sure how long it's been like this, but it doesn't look like it's going to change any time soon. What part of the encode is this status? How long should it take?

imxieyi commented 2 years ago

Did you check CPU usage? It should be finishing the video encoder and merging audio tracks. If the output video file is too large it can take a long time.

mattbisme commented 2 years ago

Waifu2x is hovering around 7% (where 1 core = 100%). Input files have been around ~3.5GB, output between 4 - 7GB (ProRes 422). These are only ~400 second videos.

imxieyi commented 2 years ago

ProRes videos are extremely large. I'd suggest that you enable "Ignore Audio" to skip merging audio tracks.

mattbisme commented 2 years ago

I know they can get big, but even now, it still says "Generating." I suspect that it must be stuck? No other encode has ever sat like that before. I didn't even know there was a "Generating" status… because it had never been there long enough for me to see it. I do use the "Ignore Audio" option.


As an aside: I do like the ProRes option. It helps prevent from generational losses. My process is something like this…

  1. Rip raw content from old DVD.
  2. Max quality 10bit HEVC (VideoToolbox) encode with Handbrake for the purpose of deinterlacing content (technically lossy, but I don't see a better way).
  3. Use waifu2x to upscale/clean (using ProRes to avoid additional loss).
  4. Reencode video with settings that are best for the desired content/quality.
  5. Combine final tracks (audio/video/subs/etc)

It's a long road, but the features and improvements put into waifu2x on macOS has made it both possible and practical.

imxieyi commented 2 years ago

Yes. It is definitely stuck. I'll look into it some time later.

VideoToolbox should be using hardware encoding, which is much worse than software encoding. Generally for best quality you should avoid any kinds of hardware encoding on any platforms.

IIUC Handbreak is just a GUI wrapper around FFmpeg. I'd suggest that you use waifu2x CLI (see this document) which can be chained with FFmpeg via pipes. In this way step 2-4 in your workflow can be simplified into a single step. The intermediate format passed between waifu2x and FFmpeg will be rawvideo, which is a real lossless format (even better than ProRes).

mattbisme commented 2 years ago

Did another batch last night; all completed in one go!

Generally for best quality you should avoid any kinds of hardware encoding on any platforms

Indeed, hardware encoding doesn't yield as much quality per bit. However, if you throw enough bits at it, you'll be fine, haha. That's why I used a max quality encode when deinterlacing. There is no perceived drop in quality. But, I do end up with a really large file… but that's fine, because it's just intermediary.

Having said that, I do like your CLI idea. Handbrake actually has its own CLI and deinterlacing is rather consistent, especially with the slowest deinterlacing settings. The upscaling process is actually slower than the slowest deinterlacing process… so I'd actually save some time by piping everything together as you suggest.

If I were to start using the CLI… how would batch encoding work? Currently, ANE + GPU in app lets me encode a couple of videos at a time. It is also very convenient being able to just drop a bunch of files into waifu2x and let them go. Does waifu2x CLI automatically pick what's available and wait if neither is?

imxieyi commented 2 years ago

Great! Looks like the fix works.

Currently you need to start multiple encoding streams manually (or to use a script, for example in Python). As for device selection, it's a bit complicated since there can be multiple GPUs on one device, which is supported on GUI. Probably it makes sense ignoring multi-GPU scenarios since most people are using parallel processing on M1 Macs anyway. I'll add a flag to force GPU on Core ML models without specifying GPU device in the next version.

imxieyi commented 2 years ago

The latest version (6.2.2) added a flag --force-gpu to force using GPU. You can write a Python script to spawn multiple CLI instances with and without the flag to achieve parallel processing.

Document for your reference: