[ Common ] Improve the FFMpeg video decoding

ChthonVII commented 1 year ago

There are several things wrong here. Since they're in close proximity, it makes more sense to tackle them at once.

The colorspace is never checked. Rather the yuv data is just handed off to FFNx.frag (or FFNx.lighting.frag) and converted to rgb using bt601 matrices. This behavior is wrong in all cases other than bt601.
- The fix is to explicitly check the video's colorspace and return an error if it's something we can't properly convert to rgb.
- (swscale would be willing to do a yuv->yuv conversion for us, but it would be incorrect without a gamut conversion (which I don't believe swscale can do) unless the color primaries were the same (which is not true for the most common use case bt709 <--> bt601))
- We should strongly consider adding bt701 matrices to FFNx.frag (and FFNx.lighting.frag) so we can handle that case too. Many encoding tools already consider bt601 deprecated, and it's only going to get harder for users to generate bt601 video going forward.
- We should also add some documentation telling users that, at least right now, bt601 is the only colorspace that we even attempt to handle correctly.
Color range is never checked. Instead we're checking if pixel format is yuvj420p and then also using that as a proxy for color range. While it's true that yuvj420p will always be full range, the reverse isn't true. For instance, yuv420p10le could be tv range or full range.
- The fix is to explicitly check the color range, independently and in addition to the pixel format check, and condition the range conversion on that.
The intended range conversion is never happening. The call to sws_setColorspaceDetails on line 241 always has both source and destination range set to full range (see line 240), resulting in no conversion ever happening.
- The fix is to use the video stream's range as the source range.
- This bug is partially compensated for by the bug in #535 which erroneously shifts the y down by 12, putting the black point at 0. This probably made both bugs harder to spot, as looking for the elevated black point is the typical way of eyeballing tv-range output.
Coercing the pixel format to 420 doesn't make a lot of sense to me. I can see the point of coercing it to something, so we don't have to handle multiple cases, but it should be 444 rather than 420. First, if the video stream is giving us more chroma data, we're just throwing it away with 420. Second, as something specifically designed for these sorts of conversions, swscale is probably going to do a better job of resampling than texture2D.
- We should also have some documentation telling users (specifically SYW) that there's little point in using a 10-bit pixel format, since we're just going to unceremoniously coerce it down to 8-bit.

I can take a first crack at implementing these changes. However, I'm going to need some help, since I lack a suitable Windows build environment and won't even be able to tell if what I wrote compiles.

julianxhokaxhiu commented 1 year ago

Thanks for opening this issue, please let's condensate everything required to be done under this issue from now on.

Regarding the multiple issues found, indeed I'm well aware the FFMpeg decoding code is far from perfect and any optimization on that regard is welcome. Additionally, I'm going to bump soon the dependency to FFMpeg 5.0 which apparently seems to have changed a bit some internals, so I'd ask to hold on with your changes or if you really want to feel free to go ahead but please keep it under single PR.

Thanks a lot and looking forward for the improvements :)

julianxhokaxhiu commented 1 year ago

FYI I pushed the required code changes to FFMpeg 5.x, so you're now free to explore how to improve the decoding code :)

ChthonVII commented 1 year ago

Well, damn. swscale won't do range conversions on 10-bit input videos. Which is all of SYW's upscales. This is going to require an ugly, hacky fix.

julianxhokaxhiu commented 1 year ago

You could try to approach the ffmpeg community and ask for ideas. Maybe they know how to handle this situation better.

ChthonVII commented 1 year ago

Looks like maaaaybe I can just call ff_sws_init_range_convert directly. I'll try that tomorrow.

julianxhokaxhiu commented 1 year ago

I think this issue will fix as well this one: #499

ChthonVII commented 1 year ago

Things appear to be working with the commits here.

Tested & working:

tv-range bt601 yuv420p (8-bit) x264
full-range bt601 yuvj420p (8-bit) x264
tv-range bt601 yuv420p10le (10-bit) x264
tv-range bt601 yuv420p10le (10-bit) x264 with no metadata (SYW's file)
full-range bt601 yuv420p10le (10-bit) x264
The original avi file from my 1998 disks (Possibly using the wrong inverse gamma function for this one, but it's such a blocky mess on a modern monitor that I can't really tell.)
Some old xvid file of the Eidos logo that I yoinked from Grimmy's old FMV replacements

Here is an iro file that allows switching between several different encodings of OPENING.avi for easier testing.

For the record, the root of most of the problems here was that swscale is buggy as heck. First, it does un-asked-for, and almost always incorrect, color range conversions based on the range metadata and pixel format. Second, it ignores the srcRange and dstRange parameters for sws_setColorspaceDetails() in almost all cases. Third, in some cases, it will do two conversions in the same direction, totally borking the video. That's how we were ending up with doubled-up TV->PC conversions that made everything too dark.

Things left to do:

Swscale claims it does color matrix conversion with a proper gamut conversion. I need to encode some bt701 material and see if that's really what happens.
Write a video encoding guide for the documentation, explaining what is preferred, allowed, pointless, and nonfunctional.
Look into picking up the inverse gamma function from the metadata and passing that along to the shader.

Assuming the bt701 tests turn out OK, this might be ready for a pull request, since the other remaining items are minor improvements relative to the amount of time they're likely to take.

ChthonVII commented 1 year ago

I think it's done now.

We should now have:

No unwanted color range conversions done by swscale (solves the "way too dark" playback problem!)
Color range conversion done by swscale (rather than shader) if and only if input is TV-range and > 8 bits per color. (Doing it the other way -- dropping to 8-bit first, then expanding the range in the shader, throws away some data, causing a bit of avoidable banding.)
Any input colorspace is acceptable, uses swscale to convert to bt601.
Correct bt601 matrix in the shader, with precise values (no rounding)
Use swscale to convert to YUV444 pixel format internally for better chroma subsampling
Detect transfer property (a.k.a. inverse gamma function) metadata and choose between srgb or SMPTE170M (or return error if not supported)

The following are tested and working correctly:

tv-range, bt601, yuv420p10le (10-bit), srgb gamma, x264, with no metadata (SYW's file)
The original avi file from my 1998 disks (Possibly using the wrong inverse gamma function for this one, but it's such a blocky mess on a modern monitor that I can't really tell.)
tv-range, bt601, yuv420p (8-bit), srgb gamma, x264
full-range, bt601, yuvj420p (8-bit), srgb gamma, x264
tv-range, bt601, yuv420p10le (10-bit), srb gamma, x264
full-range bt601, yuv420p10le (10-bit), srgb gamma, x264
tv-range, bt709, yuv420p (8-bit), srgb gamma, x264
full-range, bt709, yuvj420p (8-bit), srgb gamma, x264
tv-range, bt709, yuv420p10le (10-bit), srb gamma, x264
full-range bt709, yuv420p10le (10-bit), srgb gamma, x264
tv-range, bt709, yuv420p (8-bit), SMPTE170M gamma, x264 -- bad posterization near black, otherwise correct
full-range, bt709, yuvj420p (8-bit), SMPTE170M gamma, x264 -- some posterization near black, otherwise correct
tv-range, bt709, yuv420p10le (10-bit), SMPTE170M gamma, x264
full-range bt709, yuv420p10le (10-bit), SMPTE170M gamma, x264
Some old xvid file of the Eidos logo that I yoinked from Grimmy's old FMV replacements

I'm not sure if the posterization with the 8-bit SMPTE170M gamma files is an encoder issue, or a swscale issue, or just values getting rounded off too many times with so many conversions. At any rate, I never expect to encounter those inputs, and I have no idea how to fix them anyway.

Here is an updated iro file containing multiple encodings of OPENING.avi. (Note that I corrected some metadata from the previous version that now matters with the transfer curve detection.)

The only task remaining is to write the documentation. I believe the code is ready to go if there's canary release planned before I get that done.

ChthonVII commented 1 year ago

Arg! I spoke too soon. The posterization in the star field and its relationship to gamma got me thinking, and that way lies trouble.

Turns out SYW's videos are using plain 2.2 gamma, not srgb gamma. (The average of the srgb inverse gamma function is x^2.2, but it's actually the more complicated formula implemented in toLinear() in FFNx.common.sh. Nevertheless, it's pretty common for lazy programmers to implement it as just x^2.2.) The culprit is probably the ERGSAN AI upscaler thingy. I'm going to guess that this may not be something SYW is able to change. But plain 2.2 gamma is totally out of spec. (FFMPEG has AVCOL_TRC_GAMMA22 in its transfer properties enum, but that's supposed to be an alias for the srgb function.) So.... what should we do here?

Implement 2.2 gamma as the default behavior when a movie has no transfer property metadata, even though it's out of spec?
Add an option to the config file to switch between 2.2 gamma and srgb gamma as the default behavior when a movie has no transfer property metadata? Which setting should be default?
Ask SYW to change the ERGSAN thingy -- if he can -- and redo all his movies to spec?

Thinking about this more leads to even more trouble -- do we have the same issue with world, field, battle, and model textures? I'm going to guess that all the AI-upscaled textures are probably using 2.2 gamma. I have no idea what image editors like GIMP and Photoshop use. Or what the original textures use. I also don't know if the BC7 compression mucks with the gamma. (Somewhat frustrated with the limits of my knowledge right now.) Anyway, this brings up another possibility:

Implement 2.2 gamma as the default behavior for everything, and assume it gets more stuff right than srgb gamma.

Thoughts?

julianxhokaxhiu commented 1 year ago

I'd prefer to keep the specs as standard as possible and then we can approach modders to do the things the right way. Until then, the driver renders all of the stuff just fine so users won't notice, until one day they'll see things "better" :)

Related to the PR #534 is it now fixing these issues ( #499, #533 and #535 )? Can we close all of them in favor of this one?

ChthonVII commented 1 year ago

It fixes #533 and #535. I have no idea about #499 and I can't test b/c I don't have a HDR monitor.

I spent some time looking at how ffmpeg handles gamma, and it changed my mind about a couple things: It turns out that AVCOL_TRC_GAMMA22 really does mean x^2.2 instead of being another alias for the srgb piecewise function. (This is contrary to what other authorities say, but makes more sense.) So, contrary to what I said earlier, 2.2 gamma isn't totally out of spec. I took a deeper look at what ffmpeg does with "unspecified" transfer properties. By default, it does x^1.0 and punts the problem of sorting out the gamma to the calling application/graphics driver/operating system/whatever. In that context, I now think that handling "unspecified" as 2.2 would be a totally kosher solution.

julianxhokaxhiu commented 1 year ago

Be aware that Gamma 1.0 is the one used by HDR so it can offer deeper blacks basically. So we need to consider this use-case as well. Even if you don't have the HDR, I can test it for you but if you can mock some code I'd appreciate.

I'll close therefore #533 and #535 in favor of this one then in the meantime. Thank you.

ChthonVII commented 1 year ago

Be aware that Gamma 1.0 is the one used by HDR so it can offer deeper blacks basically. So we need to consider this use-case as well. Even if you don't have the HDR, I can test it for you but if you can mock some code I'd appreciate.

Sorry, I should have been more clear. I was talking about the earlier step where we convert from gamma-space RGB to linear RBG, and not the later step where we convert linear RGB back to gamma space for SDR but not HDR. So instead of detecting the transfer property metadata and using it to pick between two possible inverse gamma functions, use it to pick between three possible inverse gamma functions. I'll push a commit showing that in a couple minutes. I'm not touching the later step at all.

As for #499, there are two problems that I can spot, plus maybe more that I failed to see.

The first problem is exactly what this imminent commit is meant to fix -- when you use a piecewise inverse gamma function (e.g., srgb) to linearize material that was gamma encoded with a simple curve (e.g., SYW's videos), then you're going to get posterization in the near-black range where the piecewise function is linear but the curve isn't. The posterization from linearizing SYW's videos with the srgb function is only slightly noticeable on my SDR screen, but apparently the HDR stuff makes it worse. You've got to use the inverse of the original function.

The second problem is that the HDR pipeline is feeding the linear RGB into REC709toREC2020(), but the input isn't rec709. The videos are all using the rec601 color matrix, which.... AW CRAP!! I did a bit of research to verify the guess I was about to write down was correct, and the answer turns out to be a nightmare. See below. Anyway, for purposes of getting HDR right, we'll need to do a gamut conversion from something to rec709 so that we'll have correct input to feed REC709toREC2020().

And now for the new problem that came up as I was writing the previous paragraph: Rec601 is old and there are several different colorspaces associated with it. There's the original 1953 NTSC colorspace, the EBU colorspace used with PAL and SECAM, and the updated (1987) NTSC SMPTE-C colorspace. I was about to write that these videos are probably in the SMPTE-C colorspace when I spied the line on wikipedia that says that Japan never adopted SMPTE-C and stuck with the 1953 spec. So, since the PS1 was a Japanese product from the 90's, it was probably using the 1953 NTSC colorspace. Probably. Assuming a gamut conversion was never done by the extraction tool for PS1 movies nor by SYW in his workflow, then it should still be in that gamut. (There there have probably been some errors introduced by working with it without doing that conversion...) So we should probably be doing an old-NTSC->rec709 gamut conversion on the linear RGB from the movies. Both for SDR and HDR output. But I'm afraid I can't find a matrix for that, and I don't know how to derive one. It sounds like the math is explained in section 2.6.1 of Television Engineering Handbook, Featuring HDTV Systems, Revised Edition, so I'll try to track that down an educate myself. Sheesh. This rabbit hole keeps getting deeper and deeper.

julianxhokaxhiu commented 1 year ago

Thanks for the deep dive-in once again. I'll tag @satsukiyatoshi related your color conversion issues. Maybe he can help us understand which process he did follow to release the SYW packs.

Anyway we should find a way to detect this info as we can't assume all movies we feed in do use this very old NTSC colorspace, and trigger the proper color conversion based on this.

Is it possible to force ffmpeg to do all of this garbage and abstract it from us? This was my main goal. I don't want to be rude on the PR but I'd like to touch this surface the less as possible ideally using ffmpeg to the max to take care of all of this color conversion nightmare.

ChthonVII commented 1 year ago

Good news! I've got more answers than new questions for once, plus what I think is working code.

The documentation for jpsxdec, the PS1 movie extractor that I really hope SYW used, includes the YUV->RGB coefficients obtained from reverse engineering the PS1's movie decoding chip. They are slightly off from the BT601 coefficients. But after some math, I've come to the conclusion that it's really just BT601 with some rounding errors or electrical engineering compromises or something of the sort. And, thankfully, not some custom one-off colorspace invented by Sony. (Aside: We wouldn't want to use these coefficients in our YUV->RGB matrix because jpsxdec already decoded them and any subsequent re-encoding by avisynth (or whatever) would be using the usual BT601 coefficients.)

So that tells us our original color gamut must be NTSC1953, SMPTE-C, or EBU. (Aside: I learned yesterday that the BT601 color matrix is actually mismatched with the SMPTE-C and EBU gamuts, but their specs used it anyway for backwards compatibility with older television sets, since it's "close enough.") I discarded EBU as a possibility since I don't imagine a Japanese-made console would use a color standard only adopted in Europe. That leaves NTSC1953 or SMPTE-C.

Happily, I was able to find the math for generating gamut conversion matrices, here. So I was able to make both NTSC1953->Rec709 and SMPTE-C->Rec709 matrices and test them.

NTSC1953->Rec709 looks bad. The whites were shifted too far towards red. (This was particularly obvious on SYW's upscale of the SquareSoft logo movie.) This means one of two things: (a) These movies never used the NTSC1953 color gamut in the first place. Or (b) I'm getting color shifts because this kind of colorimetric conversion handles whitepoint changes poorly and I'd need to use some kind of "perceptual" method (e.g., the Bradford method) to achieve results that don't look like crap.

SMPTE-C->Rec709 looks good. My best guess is that this is correct. Playstation1 movies probably used the SMPTE-C color gamut in the first instance. So this probably gives us colors closest to what someone in the 90s would have seen watching the movies on an American CRT television plugged into an American Playstation1. (Aside: An added bonus if this is correct is that the likelihood of introducing color errors during the upscaling process would be much lower.)

A fresh set of eyes or two here wouldn't hurt. Please try everything up through the latest commit on my now-inaptly-named fmvgammafix branch with SYW's upscaled OPENING.avi (or any of his others too). (The difference from no gamut conversion is slight. In case you can't spot it, the blues should be slightly bluer, while the reds and greens are slightly more neutral.)

[Edit: Color error spotting guide for OPENING.avi:

The background of the star field should be black.
- If it's much lighter than the black letterboxing, it's a color range error.
- If it's slightly lighter than black letterboxing, it's an inverse gamma function error.
If there are posterized near-black blobs stretching between the stars, it's an inverse gamma function error.
After Aerith stands up, the camera rotates right and the wall to her left (stage right) becomes visible. The windows on the wall should be darkly visible.
- If the windows are totally lost to blackness, it's a color range error.
- If the windows are brightly visible, it's an inverse gamma function error.
As Aerith walks out of the alley, the windows (and the brickwork under them, and the brickwork on the side of the column facing the alley) should grow darker but still remain visible. If they are totally lost to blackness, it's a color range error.]

(Aside: Pretty much everything in my previous test iro is invalid now, either because I gave it incorrect metadata (in a way that now matters) or because I did the rec601->rec709 conversion with the wrong gamma.)

These last two commits should fix both problems with FMV playback for HDR that I can see. If there are more problems with FMVs and HDR, I don't know what they are.

As for the alluring idea of moving all this color conversion crap to ffmpeg: When I started out on this, I was seriously considering asking ffmpeg to just "do everything" and give us RGB444 output. However, I gave up on that for a few reasons:

We can't get ffmpeg to do the color range conversions we want. I was just barely able to get it to not do automatic color range conversions that we don't want. Until this behavior is fixed, we can't trust ffmpeg with color range. So we need to handle color range and everything "south" of there ourselves.
There's no mechanism for telling ffmpeg that, for our use case, "unspecified" transfer properties means a pure 2.2 gamma curve.
Doing matrix math on the GPU is probably a heck of a lot more performant anyway.

The road has been longer and twistier than expected, but I think colorimetry is now working correctly, for real this time. I'm going to turn to the style issues you noted on the PR (I blame github's online editor for the indentation mess) and then the documentation.

eve-atum commented 1 year ago

@ChthonVII - If it would be of any use in your efforts, I have a PS2, the "Greatest Hits" edition of FF7, and (assuming it still works) a ~2008 CRT television I could hook up and make a recording from.

ChthonVII commented 1 year ago

@eve-atum: That would probably provide a helpful sanity check. But I'm afraid you'd have to "eyeball" it rather than record it. The colorimetric properties of the recording equipment would get tangled up with those of the TV output in ways I don't believe can be disentangled. Also, I don't know whether PS2 faithfully reproduces PS1's colors, whether the TV is still calibrated correctly after so many years, whether the TV was correctly calibrated at the factory to begin with, etc. Still, it could probably give a baseline of what "close to right" looks like.

ChthonVII commented 1 year ago

After reading satsukiyatoshi's comments yesterday, and reviewing the jpsxdec source, and messing with jpsxdec a bit and examining its output, I've come to the conclusion that satsukiyatoshi's workflow didn't appreciably alter the gamma. Rather, the unusual gamma function is present in the original PS1 videos.

Using a piecewise inverse gamma function results in posterization near black. This is true even just viewing an extracted bmp frame of OPENING.avi's star field on a srgb monitor. So it appears that these videos were originally mastered using a "pure" exponential curve. (And that the SMPTE 170M and SRGB functions are both inapt.)

But what exponent?

SMPTE 170M was the standard for the broadcasts for the televisions that Playstations were plugged in to. It's a piecewise function approximately equal to x^2.0. However, that's deliberately too light in order to counteract (a) the tendency for CRT televisions to be darker than their nominal gamma, and (b) the typical old-time television use case of watching it in a darkened living room.

The nominal gamma for a CRT television was 2.2.

The typical real-world gamma of CRT televisions tended to fall between 2.2 and 2.4.

My feeling is that these videos were probably mastered with a lighter gamma, since there are details that get lost in darkness with higher gammas. But that's more of a guess than an authoritative conclusion.

So... ultimately I decided to punt and make the gamma used for movies that lack pertinent metadata configurable via ffnx.toml. The current default is 2.0. Any thoughts on whether that's the best default?

satsukiyatoshi commented 1 year ago

most tv/projector are looking better with a gamma set to 2.2 to my mind, but the real 'ideal' value can't exists because most of user don't use a calibrated display.

better use eyes than mathematical perfection ^^.

one option witch can be really helpfull (maybe with the debug tool) to test that and also hdr nits, would be to 'split' the output with 2 or 4 setting so anyone could choose his best value.

ChthonVII commented 1 year ago

OK, 2.2 it is then.

ChthonVII commented 1 year ago

I've got some good news, and some bad news.

The good news is that I've pinned down the color gamut conclusively. The wikipedia article I cited before was half correct. Japan never did adopt SMPTE-C, and they did keep their old color gamut, but their old color gamut was not NTSC1953. Rather it was 470m93, informally called "NTSC-J," which has the same red, green, and blue points as NTSC1953, but its white point is very different and unique to Japan. So I fired that up in vapoursynth, and it's pretty obvious it's correct. This was how these videos were intended to be displayed. (And few people outside Japan have seen them this way.)

Now the bad news: NTSC-J isn't included in x264's --colorprim enum, so there's no way to flag NTSC-J video in the x264 metadata. Ditto for x265. It gets worse. FFMPEG doesn't support NTSC-J at all either. So again, there's no way for ffmpeg to tell FFNx that a video is NTSC-J. And, probably worse, ffmpeg can't do color matrix conversions correctly when it can't do gamut conversions correctly, so all videos derived from the Playstation disks using matrices other than bt601 are going to be hopelessly wrong.

So far as I am aware, there are only two ways to get from the NTSC-J gamut on the Playstation 1 disks to the bt709/srgb gamut used by modern PCs. And both are unappealing.

Do it at encode time using the fmtconv filter for vapoursynth/avisynth. This is really unappealing for a couple reasons:
- It means trashing all existing videos and forcing people to adjust their frameserver scripts and encode all over again. People won't be happy. A lot of videos will never be redone.
- It's not trivial. It takes 7 calls to fmtconv to get the job done. Most people on the avisynth side don't even use fmtconv and expect a single call to one of avisynth's built-in conversion functions to do what they need. So this would be expecting a lot from people. Possibly too much.
- It requires guessing at the original gamma function, and the consequence of guessing wrong would be trashing another generation of videos and making people encode all over yet again. It's probably a pure 2.2 curve, but I'm less confident of that than I'd like to be.
Do it at playback, ourselves, in the shader. This is also unappealing for a couple reasons:
- Doing a gamut conversion "correctly" is more complex when the white points are different. I think I can get a handle on this though.
- It's a compatibility disaster. We can't distinguish NTSC-J videos by their metadata, but if we just apply a NTSC-J-to-709 gamut conversion to everything, then we're going to break the colors on any videos that aren't derived from the Playstation disks. So replacement videos using footage from Remake or Crisis Core or whatever would break. Ditto with self-made renders. Ditto with the mod logo videos. (Though I don't care too much about breaking those.) I have no idea if the avi files from the PC release ever had their gamut converted. I'm guessing not, so they'd be OK.
  - One possible approach is to say, "OK, fine, we give up on being a general-purpose video player. We're only going to play upscaled Playstation material correctly." I don't like how that feels, but an overwhelming majority of users are probably using SYW's videos, so it may be the sensible choice.
  - The other possibility is to come up with some out-of-band way to signal whether a video is NTSC-J.
    - The best idea for that I'm able to come up with is looking in two directories for movie files, and treating files found in one of those as NTSC-J. (Maybe "movies" and "movies_non_psx.") This would push more complexity onto 7th Heaven and mod makers, and force some mods to get repacked. But that's probably less bad than making people re-encode their videos.
    - (One thing that won't work is using colorprim=undef to signal NTSC-J material. Virtually no one includes the colorprim metadata in their encodes, so this is functionally the same as treating everything as NTSC-J.)

My initial take is that, from this set of bad options, the least bad option is to just treat everything as NTSC-J and give up on playing anything else correctly. I'm going to set to work implementing that to make sure it's really as feasible as I think it is. In the meantime, I'd really appreciate some thoughts and feedback here.

(Aside @CosmosXIII: I'm sorry to say that your really cool FMV pack swapping out the low-poly chibis is hopelessly busted. It mixes material in two (very) different color gamuts in the same frame, so one of them is always going to be displayed wrong no matter how you play it back. Once this is sorted out, I can help you redo them if you like.)

satsukiyatoshi commented 1 year ago

As i say to my mind this with probably not worth it because yes color misreading/convertion/decoding can be a mightmare to get the perfect rendering as intended by game designer.

But don't forget that: -ff7 source FMV are realy color degraded because of the psx limitation, FMV use limited colors palette and very limited resolution, so the psx renderer is probably not the render the artist originaly wanted((some case are even worse i have some jap anime DVD with buildin red filter to compensate the color derivation of first plasma TV in japan, so it can never be played as it should unless you own one of those tv, it's probably the same with ff7 fmv colored with crt color derivation in mind). -most used never set up their display in an accurate way, it will ruin the colors/contrast anyway, i don't think they gona notice the great job you're doing to try to restore the color accuracy. -when upscaling ff7 fmv for my pack i used on the same (calibrated) tv both the ff7 psx jap on my actual psx with real rgb cable VS my computer's playing the upscaled fmv in MPC-HC + madVr and the colors/contrasts looks realy close (maybe you can check the way madVr's rendering video - with mpc-hc i'm using lavfilters for decoding).

So yes and yes, trying to be as accurate as possible is great and i love that, but there's simply to much 3rd party stuff you can't deal with because not FFNx's dependant to be 100% accurate.

ChthonVII commented 1 year ago

-when upscaling ff7 fmv for my pack i used on the same (calibrated) tv both the ff7 psx jap on my actual psx with real rgb cable VS my computer's playing the upscaled fmv in MPC-HC + madVr and the colors/contrasts looks realy close (maybe you can check the way madVr's rendering video - with mpc-hc i'm using lavfilters for decoding).

That's to be expected. The issue is that what you see on your TV is different than what you see on your PC monitor; or, rather, what you see on a 1990s Japanese CRT TV is veeeery different than what you see on your modern PC monitor. The wavelength of light the old Japanese CRT TV poops out for, say, r=0,g=196,b=64 is very different from the wavelength that your PC monitor poops out for r=0,g=196,b=64. The point of gamut conversion is to make the PC's output look like the old CRT's output, so far as possible.

You say the difference is so small it's going to be swamped by monitor miscalibration, but you haven't seen it. It's a big enough difference that it's still going to be noticeably different even on a poorly calibrated monitor.

Also, "no one will be able to tell the difference anyway" cuts both ways. It's an argument for "don't bother with gamut conversion at all," but also just as strong an argument for "make it play satsukiyatoshi's videos correctly and don't bother with anything else."

satsukiyatoshi commented 1 year ago

As i said i compared both on my tv (psx and computer plugged to same tv, also do with the same projector but not relevant because i needed to use a scart to hdmi external converter).

I think we won't agreed about other points so just do as you want ^^, i've no problem with that

CosmosXIII commented 1 year ago

Aside @CosmosXIII: I'm sorry to say that your really cool FMV pack swapping out the low-poly chibis is hopelessly busted.

Oh god, don't make me take out my katana lol

Now more seriously, a few dumb questions:

So if I understand correctly this NTSC-J standard is how it probably was intended when Square developed the game using 90s CRT Japanese TVs as reference. And the movie data in the US / Europe versions of the game is still unchanged ? (it's still encoded as NTSC-J?). If so, then it means not even Square cared about the correctness of all of this at the time (not saying it's not worth fixing).
The wiki page for the NTSC-J standard says that NTSC-J is similar to American NTSC, the only difference being the black levels, requiring an adjustment in brightness for it to look right. Then why does NTSC1953->Rec709/sRGB conversion shifts color to red? shouldn't it just be a difference in brightness? https://en.wikipedia.org/wiki/NTSC-J
You say one way to fix this is doing some magic on the shader. But I thought the problem was a wrong conversion from whatever they original used (NTSC-J?) to what Satsuki used for his movies (yuv420p?). Isn't the damage already done here? Is it really possible the recover the data as it would be if a correct conversion was done at encode time?
What do you mean my FMV mod mixes two different color gamuts? Isn't this just the same problem as with SYW ones where supposedly a bad conversion was done from NTSC-J to yuv420p? I did my mod using Blender, my hope was that when importing a video it would convert to linear space RGB or compositing, then before exporting it would convert back to yuv420p color space (ffprobe for my movies says it's using the same yuv420p as SYW).

I haven't tried your recent changes yet but if you were able to make it look similar as when playback in Windows Media Player, VLC or whatever external media player then I think it is enough of an improvement. I'm afraid this pursue of ultimate correctness may be too much of a rabbit hole but if the correct conversion from NTSC-J is really worth it then IMO making it optional and disabled by default in the FFNx.toml would be best (otherwise we get into compatibility hell). This option could be automatically set by 7th by just adding a few lines to the mod.xml of the mod so no need to expose this option to the user.

ChthonVII commented 1 year ago

So if I understand correctly this NTSC-J standard is how it probably was intended when Square developed the game using 90s CRT Japanese TVs as reference. And the movie data in the US / Europe versions of the game is still unchanged ? (it's still encoded as NTSC-J?). If so, then it means not even Square cared about the correctness of all of this at the time (not saying it's not worth fixing).

Correct. Probably a couple of reasons for this: (1) 1990s Sony and Square didn't give a darn about the rest of the world beyond Japan. From their perspective, we were a "secondary market," and one full of philistines at that, so they didn't put in much more than minimal effort to get things working for us. (Remember the translation "team" of one guy on an impossible deadline?) (2) The entire notion that you can do math on video pixel values to convert them between various television standards was rather new at the time. They might not have even known that this was something that could be fixed.

The wiki page for the NTSC-J standard says that NTSC-J is similar to American NTSC, the only difference being the black levels, requiring an adjustment in brightness for it to look right.

As I've painfully learned, wiki is partially wrong about NTSC-J. The red, green, and blue points are the same as NTSC1953, but the white point is very different. (See footnote 3 on page 16.) It's lot brighter and a bit bluer (than srgb/rec709; it's much bluer than NTSC1953, which is, as you noted, quite red). The white point is the big difference that has so much impact when corrected for. The thing about the black pillar is only applicable to analog playback; we don't have to worry about it.

Then why does NTSC1953->Rec709/sRGB conversion shifts color to red?

Different white point.

What do you mean my FMV mod mixes two different color gamuts?

Let's say that you have two CRT televisions. Let's call them TV1 and TV2. Let's say that they both use the same red and blue phosphors, but TV2's green phosphors are twice as intense as TV1's. What happens when you play a video mastered for TV1 on TV2? Everything looks way too green. If you play a video mastered for TV2 on TV1? Everything looks way too purple. This is a color gamut problem. Now, what happens if you take some raw RGB data from each video and cut-and-paste them together into a single frame? It won't look correct on either TV. On TV1, the parts of the frame that were mastered for TV2 will look too purple; and on TV2, the parts of the frame that are mastered for TV1 will look too green. Even if you have a way to convert material from one TV's gamut to the other's, you still can't fix the problem because you just flip which part looks wrong. This is what you've done. You've cut-and-pasted material that uses the srgb gamut together with material that uses the NTSC-J gamut. The only solution is to go back to before you cut-and-pasted them together and convert one of them first. Then you can cut-and-paste apples to apples.

Now, how would such a conversion be done? Going back to my hypothetical TV1 and TV2: To make video mastered for TV1 look correct on TV2, convert from RGB to linear RGB to XYZ, then cut the green in half, then convert back from XYZ to linear RGB to RGB. The other direction is a harder case because TV2 can display some shades green that TV1 just physically can't. You have to choose between precisely converting everything TV1 can display and clipping what it can't -- which leads to banding -- or scaling everything to fit inside what TV1 can display -- which generally looks better but has some displayable-by-TV1 colors slightly off. Things get more complicated yet if the white points for each gamut are not in the same spot in XYZ color space. In that case you need to convert to XYZ and then scale the relative distances to the white point. In practice: At encode time, it takes about 7 calls to fmtconv in vapoursynth/avisynth. In the shader, we can precompute all the matrix math and reduce it all to a single matrix multiplication.

You say one way to fix this is doing some magic on the shader. But I thought the problem was a wrong conversion from whatever they original used (NTSC-J?) to what Satsuki used for his movies (yuv420p?). Isn't the damage already done here?

It's not that an incorrect conversion has been done, but rather that a conversion that needs to be done hasn't been.

There is an entirely independent problem with Satsuki's videos (and yours too) that there's been a color range compression from full to TV range. That problem is only partially fixable without redoing everything. But right now we're talking about color gamut, not color range.

I haven't tried your recent changes yet but if you were able to make it look similar as when playback in Windows Media Player, VLC or whatever external media player then I think it is enough of an improvement.

Sadly, VLC has a bug similar to the very first one that drove me to start working on this. (Or, rather, ffmpeg has a bug and VLC doesn't work around it.) So my FFNx branch is now looking better than VLC for some inputs.

I'll try to post some screenshots later showing how much change the gamut conversion gets us.

ChthonVII commented 1 year ago

As promised, here are some comparison shots with and without NTSC-J gamut correction. Gamut corrected versions are on the right. (Note that the gamma in these shots is srgb rather than 2.2, since changing it would have been a pain.)

The big blue spotlight (???) things are now blue. blueisblue

Aerith's eyes are green. (So are the sparks.) aerith_green

Go red engine go! goredengine

Everything is more saturated due to the brighter white point. everything_saturate

Barret's skintone looks healthier. barretskin

So does Cloud's. cloudskin

shikulja commented 1 year ago

@ChthonVII Excellent work, your research is extremely interesting to read. Working with color can be irreparable, so it should be avoided. I became more attentive to this when I created the ff8 font in an indexed palette, and the output should have been in srgb, which the tim conversion utility works with. Lately I've been using a cheap hdr monitor (32UN500) that doesn't fully meet the specs, without a reference it's impossible to tell if the colors are correct. Therefore, it is so important that the correct output of the image was initially.

I would like to follow you to see your research in the future, if you have any kind of blog

PS. Aeris on the right has a soft pink skin tone. Instead of the pale one on the left. You were too embarrassed to say this.) (joke)

ChthonVII commented 1 year ago

On further inspection, it appears that swscale does NOT do the accompanying gamut conversion when it does a color matrix conversion. This means that we're presently getting anything with a non-601 matrix wrong. Most importantly, we're getting bt709 wrong.

The best fix for bt709 is probably not to do a 709->601 matrix conversion in the first place, add 709 YUV->RGB conversion to the shader in parallel with the 601 YUV->RGB conversion, and then we don't need to do a gamut conversion at all, since 709 and srgb share the same gamut.

This sort of helps us out of compatibility hell with the NTSC-J gamut stuff. If we have to branch at the gamut conversion stage anyway, we can implement 709->srgb (no change), SMPTE-C->srbg, and NTSC-J->srgb in parallel, and then limit use of NTSC-J to cases where there's no colorprim metadata and the matrix is 601 (or assumed to be 601 due to lack of metadata). We can probably assume that most of the cases where people are using material from Remake/Crisis Core/Advent Children/etc. or self-generated HD material that it is -- or at least should be -- using matrix 709. (And if they aren't labeled, ffmpeg can fix the metadata without re-encoding.) So that should vastly reduce the circumstances where we'd be applying the NTSC-J->srgb gamut conversion when we shouldn't be. EIDOSLOGO.AVI and SQLOGO.AVI are going to be sticky wickets, since they didn't exist on the Playstation disks and don't use the NTSC-J color gamut. Perhaps the best way to deal with them is just to look at the file name.

While we're in the business of bypassing swscale, we might as well just pass along bgr24 (from the original PC edition videos) with the planes reordered. No point in going to YUV and back. Possibly the same for whatever FF8 uses, if it's RGB... I'll have to dig my disks out to see. If anyone's got FF8 original PC edition video files at hand, please run ffprobe and post the output.

All this may take awhile to implement, but seems pretty straightforward. At least I don't need to go learn any more new colorspace mathematics to do it.

ChthonVII commented 1 year ago

And I missed a substantial bug where we weren't actually doing the color matrix conversion either...

julianxhokaxhiu commented 1 year ago

Allright tbh this thread and the PR is becoming too difficult to follow as there's an extensive research going on. The fact you're also spreading the conversations into two different threads ( this issue and the PR ) makes it even more difficult to understand it.

Where are we with the implementation of this one? What is left to be done and what cannot be fixed? Can you please write a brief report of what I basically need to check on top of your changes?

Thank you

ChthonVII commented 1 year ago

The commits pushed so far only achieve correct playback for bt601 videos derived from the Playstation sources. I'm about 65% done implementing the stuff noted in the post two above yours. That should get us to the point where all inputs we can reasonably expect to see will be handled correctly. I've also got some half-finished documentation that will need to be substantially rewritten.

I also need to figure out where my FF8 disks are so I can check their video format. If you happen to have FF8 installed, I'd really appreciate the output of ffprobe for a FF8 video.

julianxhokaxhiu commented 1 year ago

Regarding FF8, the movie rendering works a bit different than FF7. Unlike there where we do have direct AVI files that we literally open and render through ffmpeg, on the original 2000 release this process goes through the internal game engine and all we do is render frame by frame, nothing special in there.

Although on FF8 2013 Steam release we do have external movie files and this is where we use the FFMpeg layer as well for FF8, Although those ones are literally re-encoded by Square-Enix so I'm not sure which process they did follow. I'll post the ffprobe output in a moment for you.

//EDIT: there it goes

$ ffprobe disc04_00h.avi
ffprobe version n5.1.2 Copyright (c) 2007-2022 the FFmpeg developers
  built with gcc 12.2.0 (GCC)
  configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libmfx --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librav1e --enable-librsvg --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-nvdec --enable-nvenc --enable-opencl --enable-opengl --enable-shared --enable-version3 --enable-vulkan
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
Input #0, matroska,webm, from 'disc04_00h.avi':
  Metadata:
    encoder         : libebml-0.7.5 & libmatroska-0.7.7
    creation_time   : 2012-12-06T16:52:41.000000Z
  Duration: 00:00:08.81, start: 0.000000, bitrate: 382 kb/s
  Stream #0:0(eng): Video: vp8 (VP80 / 0x30385056), yuv420p(progressive), 1280x960, SAR 1:1 DAR 4:3, 29.97 fps, 29.97 tbr, 1k tbn (default)
  Stream #0:1(eng): Audio: vorbis, 48000 Hz, stereo, fltp (default)

shikulja commented 1 year ago

FF8 STEAM

Input #0, matroska,webm, from 'disc04_00h.avi':
  Metadata:
    encoder         : libebml-0.7.5 & libmatroska-0.7.7
    creation_time   : 2012-12-06T16:52:41.000000Z
  Duration: 00:00:08.81, start: 0.000000, bitrate: 382 kb/s
  Stream #0:0(eng): Video: vp8 (VP80 / 0x30385056), yuv420p(progressive), 1280x960, SAR 1:1 DAR 4:3, 29.97 fps, 29.97 tbr, 1k tbn (default)
  Stream #0:1(eng): Audio: vorbis, 48000 Hz, stereo, fltp (default)

FF8 2000 videos in .Pak format https://ff7-mods.github.io/ff7-flat-wiki/FF8/FileFormat_PAK.html https://forums.qhimm.com/index.php?topic=18656.25

HIRES

Input #0, bink, from '31.bik':
  Duration: 00:03:15.00, start: 0.000000, bitrate: 2047 kb/s
  Stream #0:0[0x0]: Video: binkvideo (BIKf / 0x664B4942), yuv420p(tv), 640x448, 15 fps, 15 tbr, 15 tbn
  Stream #0:1[0x0]: Audio: binkaudio_rdft, 44100 Hz, stereo, flt

LOWRES

Input #0, bink, from '31.bik':
  Duration: 00:03:15.00, start: 0.000000, bitrate: 914 kb/s
  Stream #0:0[0x0]: Video: binkvideo (BIKf / 0x664B4942), yuv420p(tv), 320x224, 15 fps, 15 tbr, 15 tbn
  Stream #0:1[0x0]: Audio: binkaudio_rdft, 44100 Hz, stereo, flt

ChthonVII commented 1 year ago

Regarding FF8, the movie rendering works a bit different than FF7. Unlike there where we do have direct AVI files that we literally open and render through ffmpeg, on the original 2000 release this process goes through the internal game engine and all we do is render frame by frame, nothing special in there.

Although on FF8 2013 Steam release we do have external movie files and this is where we use the FFMpeg layer as well for FF8, Although those ones are literally re-encoded by Square-Enix so I'm not sure which process they did follow.

Well shoot. Even if I find my disks, it won't help then. Would you mind sending me one of the FF8 Steam release movie files for testing? Also, does FF8 have any movies that exist in the PC version but not the Playstation version? (FF7 has eidoslogo.avi and sqlogo.avi that were added for the PC release. They use a different color gamut.)

shikulja commented 1 year ago

@ChthonVII I didn't find you on the discord, you can write a mail, I'll try to send

ChthonVII commented 1 year ago

Thank you shikulja!

As best I can tell:

The FF8 Steam videos are from a different source as the PS1 videos. The aspect ratio (the PAR) is a bit different and PS1 is cropped as compared to Steam.
No metadata on the Steam videos :(
Gamut conversion was already done.
Color matrix is probably BT709. Honestly, it's really hard to tell because CGI video lacks real-world objects where incorrect color is more obvious, and my best basis for comparison isn't the same video to start with, and has to first be fed through a gamut conversion that likely uses a different algorithm than Steam did.
Color range is TV. (Seriously, what the hell Steam?! You guys can figure out how to do a gamut conversion, but then you compress to TV range for a PC release?)

I'll try to get this accounted for later today.

ChthonVII commented 1 year ago

Does anyone know if bgfx applies any dithering when reading in the buffer of unsigned chars and converting them to a buffer of 32-bit floats?

julianxhokaxhiu commented 1 year ago

Bgfx does no data manipulation, it passes it as it is to the GPU.

ChthonVII commented 1 year ago

New toy! Command line png-to-png tool for applying NTSC-J-to-sRGB gamut conversion. Hopefully useful for satsukiyatoshi and other folks working with the original textures. (Research even turned up a nice dithering algorithm that doesn't care if the input image is "swizzled" like FF7 field textures.)

ChthonVII commented 1 year ago

Does anyone know how to fetch texture dimensions in bgfx? Since this is supposed to be bastardized glsl, I tried textureSize(), but that results in a D3DCompile error. There's a function named bgfxTextureSize() in bgfx_shader.sh, but that results in a "no function with name..." compile error. (Is the #include somehow failing?)

I could pass them as uniforms, but that's slow and wasteful when the shader language has (or at least should have) a way to access the dimension directly.

julianxhokaxhiu commented 1 year ago

@ChthonVII it depends on which shader file you're calling it. Not all of them include the bgfx shader file. Can you post here a diff or a patch link to the change you're trying to accomplish?

ChthonVII commented 1 year ago

I'm working with FFNx.frag and FFNx.lighting.frag, both of which have #include <bgfx/bgfx_shader.sh> at the top.

I want the dimensions of tex_0, tex_1, and tex_2.

This gets the D3DCompile error:

ivec2 ydimensions = textureSize(tex_0, 0);
ivec2 udimensions = textureSize(tex_1, 0);
ivec2 vdimensions = textureSize(tex_2, 0);

This gets the "no function with name..." error:

vec2 ydimensions = bgfxTextureSize(tex_0, 0);
vec2 udimensions = bgfxTextureSize(tex_1, 0);
vec2 vdimensions = bgfxTextureSize(tex_2, 0);

julianxhokaxhiu commented 1 year ago

@ChthonVII I was doing some tests and I think it's not a covered function in bgfx or there is a bug. I'll try to approach the OA and I'll ask him about possible next steps here. bgfxTextureSize is not known because it is an internal function and not an exposed one. Although if we use textureSize which is defined, it doesn't match the sample type ( first param ) because it's a sample2D not a BgfxSampler2D.

//EDIT: Nvm, I found the core issue. The error pops up because the function you're trying to use doesn't work under DirectX 9. As this was an experimental backend wanted by @eve-atum tbh we can start throwing it out the window. Consider removing these lines for now so you're unblocked. We'll evaluate once you're ready how to proceed afterwards.

ChthonVII commented 1 year ago

@julianxhokaxhiu: Yep, that was it. Is there a way to gate stuff behind #ifdef's or something so the D3D9 shaders can be built without this feature?

julianxhokaxhiu commented 1 year ago

@julianxhokaxhiu: Yep, that was it. Is there a way to gate stuff behind #ifdef's or something so the D3D9 shaders can be built without this feature?

You can try to use this syntax:

#if BGFX_SHADER_LANGUAGE_HLSL > 300
// your core here
#endif...

ChthonVII commented 1 year ago

@julianxhokaxhiu: Worked like a charm. Thanks!

@everyone: I just pushed a commit with an experimental gamma function and I'd like people's thoughts on it.

Pros are:

It looks pretty darned good to me, in that...
- It avoids the banding near black that we get in the "toe slope" of piecewise functions like SMPTE170M and sRGB.
- The shadows are brighter than a pure 2.2 curve, where I felt like details were getting lost to darkness. For example, the alleyway in opening.avi. (It's even brighter than a 2.0 curve for inputs less than ~0.04.)
- The brights aren't blown out like they are with a pure 1.9 or 2.0 curve .
It's overall pretty close to a pure 2.2 curve, which is the nominal gamma for NTSC.
Except near black, it's very, very close to the sRGB function that we're using for non-video stuff. (Which means that we could get close to seamless transitions from video to gameplay for things like the Gold Saucer cable car, if we did a gamut conversion on the field textures. (Which I plan to, using my new png tool.))
It would provide an excuse to remove the user-controlled gamma setting that julianxhokaxhiu really, really doesn't like.

Cons are:

It's an unprincipled hack. This can't possibly be the "correct" gamma function for this material, because I just made it up.

ChthonVII commented 1 year ago

Status Update:

I think the revamp of video playback is now done. Really and truly done this time. Playback, especially colorimetry, for (a) the Playstation videos, (b) SYW's videos, (c) the FF7 PC release videos, and (d) the FF8 Steam release videos should now be as good as I am capable of making it. Other stuff should play back correctly too, assuming it has accurate metadata.
A few things are untested: The bt709 full-range color matrix and all the PAL stuff. These should be correct unless I managed to mess up when transposing a matrix to column-major.
I added some fairly extensive documentation in /docs/mods
I'll be away for the next week without computer access. Please feel free to move forward on this without me.

julianxhokaxhiu commented 1 year ago

@ChthonVII thanks a lot for the status update. I'm tracking the PR and I think you've made some fantastic progress on this one. The only part I see yet not tested/covered is the HDR part. Would you be able to take care of this eventually? As you've spent a lot of time digging into this part, and you could double check if the HDR rendering looks good as well I'd appreciate.

Great work so far and thanks again, looking forward on getting this merged :)

//EDIT: If you can also update the PR description to be fit with all the changes you did so far I'd appreciate. I think it deserves some cleanup and I think you're actually testing your own PR so please make sure that part is reflected as well. Thank you in advance.

ChthonVII commented 1 year ago

The two major obstacles to me working on HDR stuff is that (1) I don't have a HDR monitor, and (2) I don't have a detailed grasp of how HDR display is supposed to work. It may be possible to remedy the latter through education, but the former is going to mean a lot of annoying "can you test and see if this does what I think it does?" posts. I'll take a look at it when I return.

Speaking of annoying "can you test this?" requests, how does HDR look for the videos with with the current PR? Is there a significant difference between tv-range and full-range inputs? (I.e., does the dithering done on tv-range inputs incidentally improve the HDR stuff?)

[edit: The green crap in the star field is likely a missing saturate().]

julianxhokaxhiu / FFNx

[ Common ] Improve the FFMpeg video decoding #536