ramapcsx2 / gbs-control

GNU General Public License v3.0
773 stars 110 forks source link

Maintain constant latency with clock generator installed #405

Closed nyanpasu64 closed 1 year ago

nyanpasu64 commented 1 year ago

This PR continuously adjusts the video output frame rate to keep video output a fixed distance (currently 1/4 frame) behind input, when "FrameTime Lock" is checked and external clock generator is installed. This fixes the bug where when a clock generator is installed, the GBS-Control would initialize input-output latency to a random value, then the latency would slowly drift over time, eventually resulting in a tear line between 0 and 1 frame old images. (Technically the initial latency is still random, but latency is reliably brought to 1/4 frame within seconds.)

The latency control algorithm used is more primitive than I described in https://github.com/ramapcsx2/gbs-control/issues/286#issuecomment-1401230470; we instead use the clock generator to offset the output frequency from the estimated input frequency, by a factor proportional to latency error (clamped to ±0.001 to avoid large frequency offsets). And I'm solely using blocking latency measurements every 100 frames (plus overhead) like your older frame sync code, rather than rewriting the code further to add continuous interrupt-driven non-blocking measurements.

But the results are surprisingly good. If you turn on FRAMESYNC_DEBUG, the logs show my code consistently brings latency to a steady-state of within 0.0005 (5 * 10^-4) frames of FrameSyncAttrs::syncTargetPhase (90 degrees = 1/4 frame behind input), starting from either console power-on, input resolution changes (240p/480i/480p), or after repeatedly changing output resolution to "480p 576p" to deliberately induce extra latency. I did not test replugging the GBS-C, but I see no reason it wouldn't work well too. I tested primarily at 480p output, but 960p output seems to work fine too.

I also measured video latency (at 480p output) by feeding the input luma signal (through a Y-splitter), and a solar panel at the top of my CRT, into the left and right channels of my my computer's audio line in, then recording at 192 KHz. This showed a latency of just over 4 milliseconds from video input to output, consistent with debug logging.

In "Pass Through" mode, frame sync is unnecessary. My logs show that it is never attempted.

This is a working prototype, but not ready for upstreaming.

Issues

Not bugs, but room for improvement

(i've spent long enough poring over my code and reading/revising this post over and over...)

Fixes #286.

ramapcsx2 commented 1 year ago

commit history: simple: I don't care. At all. The history is a great mess already, how much worse can it get? ;p

findBestHTotal: This comes up in a few variations. Basically, I don't think this is meant to run (or do changes) with the clock gen installed. If it does, then it was for side effects or to keep custom settings in presets.

Do you use a debugger: Nope, feel free to add support, if you want :)

freak measurements: Comes up a in some variations. Basically, if you measure 77Hz on the source, and it's just a 480p Wii, that's wrong :p It needs to be ignored, another measurement scheduled, maybe the sync situation isn't stable..

when to take VSync period measurements: Experiment and see what you get :)

small clock gen adjustments: The idea is to try and keep the attached display happy, by indeed changing the frequency in very small steps and with some wait time. Whether it works or not depends a lot on the display though. If the display doesn't like it, it will drop sync, go to black screen and resync.

volatile: These are not ideal, but you're guaranteed to read or write atomically on this architecture (no read while it's getting written). Volatiles do not prevent the compiler from reordering or otherwise breaking the intent though. Proper atomic variables would solve it (but are a bit tricky to use). Basically, I just assumed they work fine, until proven otherwise.

ramapcsx2 commented 1 year ago

Broadly though, I want to encourage you to try your new code with various sources, and if it seems to work, that's good enough for me. It's going to be an improvement!

nyanpasu64 commented 1 year ago

https://github.com/ramapcsx2/gbs-control/issues/286#issuecomment-1407641143

If you can run a few more sources (SNES and PSX for example cover a broad range), the better.

I'm going to test my PR on SNES later today. I don't have a PSX, but running 240p/480i transitions in 240p Test Suite GX showed (EDIT: in further testing, up to 0.053633 frames, and 0.064307 frames if I time a transition during the beginning of a frame sync measurement) of latency change, which is well under the 0.25 frames of target latency. In fact I think even 0.1 frames of target latency would be (EDIT: marginally) safe even in games switching between these video modes, but I'm not 100% confident there will never be tear lines at that latency.

If fixes and improvements can be broken up into smaller PRs, then that's good, but sometimes everything is interrelated

Ideally I'd remove the compile_commands.json and restore the OLED menu. At that point, I'll have to decide whether to split out my earlier changes ("Rename serialCommand and userCommand", "Remove commented-out code blocking code folding", enum OutputMode, console.log("GBSControl.ws.onmessage()"); (still not sure what I did in ae28a351f7d9668ba9164b54f824ecd84dc3740d)) into a separate PR.


freak measurements: Comes up a in some variations. Basically, if you measure 77Hz on the source, and it's just a 480p Wii, that's wrong :p

I'm thinking that running the frame lock every 6 frames instead of 100, in a local debug build, will make the incorrect FPS measurements much more reproducible (much higher chance a bad input frame will coincide with measuring input FPS). (Though I should also power off my monitor so I don't strain it with nonstandard refresh rates.)

Another idea is taking pairs of measurements in a loop until I get two similar vblank duration results (fail after 1-3 non-matching measurements to avoid looping indefinitely), then taking the median of the previous measurement and the two new ones (or if there's no cached measurement, take the first/second/average measurement).

small clock gen adjustments: The idea is to try and keep the attached display happy, by indeed changing the frequency in very small steps and with some wait time. Whether it works or not depends a lot on the display though. If the display doesn't like it, it will drop sync, go to black screen and resync.

My CRT display doesn't lose sync with 0.1% changes in pixel clock. Perhaps I can release the current code as-is, and if anyone complains of losing sync, I'll consider reducing the maximum frequency deviation then.

nyanpasu64 commented 1 year ago

Testing SNES and sync glitches

SNES RGB only reaches ±0.002 frame of latency even after setting the output resolution to 480p a second time. This is substantially worse than the ±0.0005 frames I've seen on Wii, but still an order of magnitude less error than the maximum I'd tolerate (0.02 frames or so). I wonder if this is because the SNES's frame rate drifts more than Wii? Or does my code's rounding errors respond less favorably to 262 than 263 line 240p?

Perhaps if sync gets interrupted to the point the GBS-C outputs numbers/dots/asterisks, we should set delayLock = 0? I'm not sure how to do that, and it may fail to recognize sync interruptions that don't trigger numbers/dots/asterisks but are large enough to tamper with FPS measurements

Momentarily tapping my SNES's Reset button usually (not 100% of the time) causes 2 or . to appear on the debug console.

With lockInterval set to 6 frames rather than 100 (to perform frame locking nonstop), resetting my SNES reliably (more often than not) causes my code to misdetect input frame rate, with intervals as low as 25-30 FPS or as high as 100s-1000s of FPS. This turned out to be extremely valuable in developing a robust system for surviving sync glitches.

Fixing sync glitches

I settled on taking two samples at a time, and rejecting them if either measurement fails, or the measurements are too far apart (either 0.5 FPS or a relative difference of 0.5/60, picked somewhat arbitrarily). I decided against implementing a "median of two measurements and old value" system, since it offers no benefit (the two incorrect measurements would override the correct old input FPS value) but adds code complexity and state management failure modes.

Unfortunately, resetting my SNES still sometimes produced two matching incorrect measurements, which is treated as a new input FPS. (I had previously clamped the maximum deviation from newly measured input FPS, so when the input FPS changed, the output FPS would swing by massive amounts.) To avoid issues here, I instead clamp the maximum output FPS change to ±0.001 ratio (±0.1%) from the previous output FPS.

To keep the system observable (so I know when sync glitches are seen and caught by my FPS watcher code), I made my code print a "FPS excursion detected!" message whenever the detected input FPS passed my validity checks, but deviated by 1 or more FPS from the previous output FPS (I don't have the previous input FPS, but previous output FPS should be close enough).

At this point, I performed testing using my SNES and Wii:

runFrequency(): vsyncPeriodAndPhase failed, retrying...
(later)
runFrequency(): getPulseTicks failed, retrying...

I haven't dug into what's going wrong, and it's worth debugging. But these don't affect sync quality, since the measurement retries work fine. Even if a retry fails occur and a frame lock iteration is skipped (which I haven't seen in limited testing), this generally isn't enough to cause latency to drift below zero and cause tearing (unless multiple consecutive frame locks fail on both their measurement and retry, which is exceedingly unlikely).

After performing this testing, I noticed that externalClockGenSyncInOutRate() ignores frame rates outside of the range [47, 86] Hz. I decided to implement this in runFrequency() as well. Afterwards, I found that glitchy sync during console resets and 15/31 kHz video mode changes would almost never make it through all checks and set the output frame rate incorrrectly. (In fact, in my brief testing I've never gotten "FPS excursion detected!" to show up, indicating there were no changes in validated input FPS greater than 1 Hz.)

Further ideas for improving handling of sync glitches

Rebase

I force-pushed a cleaner master branch (no longer disabling OLED) to my fork, created PR #406, and rebased this branch on the new master. I hope my changes in master or this PR didn't break the OLED menu I had previously disabled.

nyanpasu64 commented 1 year ago

Since you squash-merged #406, I rebased and force-pushed on master. There are no code changes.

maybe I should add a comment OutputBypass = 10, // Pass Through in a subsequent PR? Not sure.

If you're going to squash-merge this PR too, I think it's worth adding this in a separate PR (or you push a commit yourself, to avoid one-liner low-utility PRs) to avoid cluttering the squashed diff. Though if you disagree, I can add the comment here, or drop it entirely (it's not absolutely critical to add).


I've been playing Wind Waker testing steady-state operation, and have found no tear lines so far (though admittedly it's harder to see tear lines on 30fps games), and all prior testing suggests extremely stable video latency (though I don't have my "line in recording" latency measurement rig set up, and disabled debug printouts, so I can't verify this is the case in my current play session).


Will the docs need to be updated to say that frame time lock is now useful with clock generator installed?

ramapcsx2 commented 1 year ago

I've sent you an email :)

nyanpasu64 commented 1 year ago

Is it possible to eventually implement output-frequency-based latency control, without an external clock generator? I don't know. Even after looking over the register/programming PDFs, I'm not sure what modes the TV5725 has available, to drive the input ADC clock or output DAC clock together or separately, with or without a clock generator, and whether you can fine-tune the output frequency without a clock generator to adjust latency.

I had hopes that PLL_R/S and PLLAD_R/S would help, but it seems your code already sets those flags, and they control input sampling rate per line rather than output frame rate? Not sure.


EDIT: During testing, I rebooted the GBS-C in 1280x960 mode, then an unknown amount of time later exited from 240p Test Suite (240p) to Homebrew Channel (480p) (or possibly exited earlier but it only applied after the reboot finished). This apparently crashed the ESP (the web UI displayed a red disconnected sign permanently, until I reset the board), and left the video scaler outputting 480p YPbPr (nearly-all green in a VGA monitor) offset to the left.

IMG_20230201_035009

Restart
user command a at settings source 1, custom slot 65, status 4
source Hz: 59.82494 new out: 59.82494 clock: 161973344 (-26656)

.
2345678
Format change: 13 <stable>
clock gen reset: 161973568
ADC offset: R:44 G:44 B:42
clock gen reset: 161973568
(hang)

I have no clue if it's related to this PR, or restart sequence, or power delivery, etc. I was unable to reproduce this bug, and didn't know how to debug the crash (especially since I didn't have the ESP connected to my PC via USB).

https://arduino-esp8266.readthedocs.io/en/latest/faq/a02-my-esp-crashes.html

I've been able to get a temporary green corrupted screen (but no ESP crash) by switching to 480p right after the ESP's LED turns on after a reboot, or exiting from 240p Test Suite half a second after rebooting and waiting for the video output to go blank.

ramapcsx2 commented 1 year ago

The Wii does that sometimes, iirc. It doesn't like the 240p test suite, is all i remember.

nyanpasu64 commented 1 year ago

Decided to do further testing with a Dell U2312HM LCD monitor, as a stand-in for LCD monitors/televisions.

I connected my GBS-C's inputs to a Wii (via component) and SNES (via RGB), set my GBS-C to output VGA to my monitor at 1080p, then started experimenting with various video modes and signal interruptions. I tested powering on my Wii to HBC in 480p mode (startup random latency/phase), exiting from HBC to Wii Menu (480p randomizing input phase after console power-on), and powering-on and resetting my SNES console (240p startup random phase, and randomizing phase mid-signal).

Results

It appears that with the GBS-C to LCD signal set to 1080p, the LCD monitor is especially sensitive to changes in input frequency. Anything causing a large increase (haven't seen a decrease work) in GBS-C output frequency will cause the screen to go black for ~1.5 seconds before resyncing. Afterwards, the monitor apparently "re-centers" to the newer frequency, and can freely switch between the old (low) and new (high) frequencies without losing sync. The monitor loses sync again if you increase the frequency further past the high frequency without power-cycling, or if you switch to the low frequency and power-cycle the monitor, then raise the frequency again.

With the GBS-C in frequency-based frame-time lock mode (this PR), I've been able to trigger sync loss by sometimes turning on the SNES or Wii, sometimes resetting the SNES or exiting the Wii to System Menu (in frame-time lock), and usually (always?) switching Wii 240p Test Suite between 240p and 480i. (I'm unsure if entering/exiting 240p Test Suite causes a frequency change, as the GBS-C itself loses input sync when the Wii switches between 15 and 31 kHz signals.)

Even with frame-time lock disabled, switching the Wii from 240p (263 lines, 59.82 Hz) to 480i (262.5 lines, 59.94 Hz) is always enough to cause my monitor to lose sync. (Oddly switching the other way around, reducing frequency, hasn't yet caused my monitor to lose sync.)

Sidenote: I initially suspected the monitor lost sync because I changed video output frequency too quickly. externalClockGenSyncInOutRate included code to gradually ramp the output rate up or down by 1 kHz at a time, and I initially thought 240p-480i transitions did not lose sync. However, removing the frequency ramp did not cause 240p-480i transitions to lose sync, but instead power-cycling the monitor did. Additionally, porting the frequency ramp to FrameSync::runFrequency did not fix my monitor losing sync. Even after I switched FrameSync::runFrequency to only change output frequency by 0.05% at a time, my monitor still lost sync, but after two 0.05% changes rather than one 0.1% change.

I've been able to get my monitor to lose sync twice in a row without power-cycling in between, by hard-resetting the GBS-C to create input latency (with External Clock Generator on but FrameTime Lock off), power-cycling my monitor while showing 240p Test Suite (59.82 Hz), switching my Wii to output 480i (59.94 Hz, so externalClockGenSyncInOutRate raises the output frequency and breaks sync), then enabling FrameTime Lock (raising the frequency even further to 60 Hz temporarily and breaking sync again). Oddly, after my monitor locks onto sync at 60 Hz, it doesn't lose sync when switching back down to 240p (59.825 Hz input rate, and as low as 59.813 output rate for one iteration to increase latency towards the target).

I decided to check how my monitor behaved at other VGA frequencies. When testing 240p-480i transitions, it seems my monitor can tolerate both input frequency increases and decreases when receiving VGA 480p, and can tolerate neither frequency increases nor decreases at 720p or 960p. I'm not sure how other LCD monitors behave. (My CRT has no problems with frequency changes, at any resolution I've tested it with.)

(Unrelated observation: 240p Test Suite displaying a wholly scrambled image, that persisted across output resolution changes... perhaps because I had switched resolutions with no input signal, and GBS-C "helpfully" disabled the sync watcher and enabled debug mode, and I hadn't known to turn the sync watcher back on before switching from 480p to 240p. Can I disable this code branch so it doesn't bite me and likely others?)

Next steps

Not much we can do about the monitor losing sync. Workarounds:

I'm going to keep the frequency ramp, because hopefully it prevents the TV5725 from displaying glitched output. In the code without a frequency ramp, one time I saw half a scanline of corrupted output, which coincided with a log message describing changing the frequency by 0.1% instantaneously.

ramapcsx2 commented 1 year ago

Oh, I think you've found a particularly sensitive display then. This is the thing: Nearly every other monitor / TV / collectively "sink" will behave differently. This was annoying to no end when I initially wrote this and dialed in settings..

Priority for where sync drops should be avoided is on 240p/480i switches, as that is a common occurrence when playing games on some systems. The PSX has this bad, so it's a really good test console (has the 240p test suite, too :)).

Next down on the priority list is what happens when the console resets its GPU (PSX again great here), which would happen on a console reset, for example. I don't think all sync losses can be avoided here, but sometimes, a monitor will endure this fine, just showing a black picture with a few glitches for a bit.

And then furthest down is obvious drastic source timing changes. Sync can be allowed to drop then.

You are asking for what to prefer in some sync loss unavoidable situations: I think you're on the right track there. If sync will be lost, then it would be best if we get a stable + retimed output back as soon as possible. It will be better to re-time everything at once, provided that the code first determined the new source timings are stable. The PSX is again a good example, where a PAL console will start the GPU in NTSC mode (!) on boot, then switch to PAL a few millis later. A CRT TV will not get too upset by this, but every scaler I know struggles with this a lot, even a RetroTink5X hates this.

Frequency ramping works well on many displays, and I would recommend to stay close to the ramp steps / timings that I had. They worked across my test gear selection.

nyanpasu64 commented 1 year ago

Priority for where sync drops should be avoided is on 240p/480i switches, as that is a common occurrence when playing games on some systems. The PSX has this bad, so it's a really good test console (has the 240p test suite, too :)).

When the input frame rate changes, the output frame rate must change. I think there's flat out no way to make my LCD on 720p/960p accept 240p/480i transitions, outside of possibly hacks like transmitting a video signal halfway between the 240p and 480i refresh rates, waiting for the monitor to sync and hoping you don't overflow 0-1 frames of latency, then afterwards transmitting the real refresh rate.

I think you're on the right track there. If sync will be lost, then it would be best if we get a stable + retimed output back as soon as possible. It will be better to re-time everything at once, provided that the code first determined the new source timings are stable.

The current code is not perfect (probably takes longer than ideal to respond to console resets, due to lastVsyncLock), but since I restricted it to a ±0.06% output frequency swing, it does not cause my monitor to lose sync at all. Additionally it takes longer than ideal to respond to 240p/480i switches, but I consider that out of scope for this PR, since I did not write or change that code in this PR (and don't even understand it).

Frequency ramping works well on many displays, and I would recommend to stay close to the ramp steps / timings that I had.
They worked across my test gear selection.

Are you talking about externalClockGenSyncInOutRate adjusting the clock generator by 1 kHz at a time? If you think the current formula doesn't need changing, I'll keep it as-is.

Testing old frame time lock

Are there any TVs or monitors that can't even accept VGA vsync+hsync rate changes as small as 0.06%? (I suspect this is very rare, since you say I have "a particularly sensitive display" and even it can handle 0.075% without problems.)

If so, it may be worthwhile to turn off the external clock generator if you want to use frame time lock (bypassing this PR entirely). Afterwards, frame time lock will change vtotal rather than the pixel clock and hsync rate, and some monitors tolerate that better. For example, with internal-only frame time lock, my LCD doesn't shift the image vertically at all in Method 1, and doesn't lose sync when the frame rate changes, even in the ever-picky 960p resolution.

I'd keep external clock generator on for displays like my CRT, which tolerates frequency changes but offsets the image upon line count changes (in both FrameTime Lock methods!). Frequency-based frametime lock also has the benefit of converging to a perfectly stable latency, rather than vtotal-based frametime lock causing latency to jitter by significant fractions of a frame in steady-state operation (I think, see below), so I might even use it on my LCD.

Priority for where sync drops should be avoided is on 240p/480i switches

Unfortunately, with "FrameTime Lock" enabled and external clock generator disabled, my monitor still loses sync (at 960p) just under a second after the GBS-C input switches between 240p and 480i! This is both the most important case of input frame rate change, and one that "FrameTime Lock" does not fix (and perhaps cannot easily fix).

nyanpasu64 commented 1 year ago

LCD gameplay play-testing

I spent a few days gaming on my Wii outputting 480p component, plugged into a GBS-C upscaling to 960p with clock generator and frame sync enabled, hooked up to my LCD.

For the later part of my testing, I tried decreasing video latency: I set syncTargetPhase to 36 (/360 = 0.1 frame) and reflashed my GBS-C. I am not making this change in this PR, as it requires further testing in 240p/480i transitions, and further changes to be production-ready:

Low-latency measurements

I then plugged the GBS-C back into my CRT, and checked the resulting latency using a solar panel and audio interface (on Wii Home screen because I'm lazy).

Data at 960p output 0.1 frame target.aup3.zip.

Results:

Conclusion

At this point I'd say this code is reliable and well-tested (though I haven't had other people verify on their devices), and I'm not opposed to merging at this point.

There remain code-level nitpicks; for example I shouldn't have added separate FS_DEBUG and FRAMESYNC_DEBUG (I think these can be unified, since FS_DEBUG mostly affects vtotal sync and FRAMESYNC_DEBUG mostly affects frequency sync). But I don't particularly care, since they don't affect users, don't block the critical path of maintenance, and I think they're resolvable easily.

ramapcsx2 commented 1 year ago

I can say that the 1080p output presets are at the limit of the chip, and indeed compromised. I couldn't get a nice aspect ratio while keeping sharp scaling, so whatever is there now is some form of compromise, not a preferred choice :p

Latency sounds good. Basically all latencies below 16.6ms / one frame are a great result, and it doesn't matter whether it's closer to 4ms or closer to 2ms. Latency only becomes an issue, once it adds to perceivable lag, which would be in the several frames range.

If you want to preserve more of this work's history, a merge commit is fine. If it doesn't matter much to you, squash is fine :)