Maintain constant latency with clock generator installed

This PR continuously adjusts the video output frame rate to keep video output a fixed distance (currently 1/4 frame) behind input, when "FrameTime Lock" is checked and external clock generator is installed. This fixes the bug where when a clock generator is installed, the GBS-Control would initialize input-output latency to a random value, then the latency would slowly drift over time, eventually resulting in a tear line between 0 and 1 frame old images. (Technically the initial latency is still random, but latency is reliably brought to 1/4 frame within seconds.)

The latency control algorithm used is more primitive than I described in https://github.com/ramapcsx2/gbs-control/issues/286#issuecomment-1401230470; we instead use the clock generator to offset the output frequency from the estimated input frequency, by a factor proportional to latency error (clamped to ±0.001 to avoid large frequency offsets). And I'm solely using blocking latency measurements every 100 frames (plus overhead) like your older frame sync code, rather than rewriting the code further to add continuous interrupt-driven non-blocking measurements.

But the results are surprisingly good. If you turn on FRAMESYNC_DEBUG, the logs show my code consistently brings latency to a steady-state of within 0.0005 (5 * 10^-4) frames of FrameSyncAttrs::syncTargetPhase (90 degrees = 1/4 frame behind input), starting from either console power-on, input resolution changes (240p/480i/480p), or after repeatedly changing output resolution to "480p 576p" to deliberately induce extra latency. I did not test replugging the GBS-C, but I see no reason it wouldn't work well too. I tested primarily at 480p output, but 960p output seems to work fine too.

I also measured video latency (at 480p output) by feeding the input luma signal (through a Y-splitter), and a solar panel at the top of my CRT, into the left and right channels of my my computer's audio line in, then recording at 192 KHz. This showed a latency of just over 4 milliseconds from video input to output, consistent with debug logging.

In "Pass Through" mode, frame sync is unnecessary. My logs show that it is never attempted.

This is a working prototype, but not ready for upstreaming.

Issues

[x] The commit history is quite messy, and the diff includes local changes, both refactors I intend to submit a PR for (if deemed acceptable), and machine-specific changes (like compile_commands.json, and removing settingsMenuOLED to simplify the code because I don't have a screen installed) that should not be merged.
- I partly added enum OutputMode, which is an improvement over the original code, but I'm not confident about the variable names, and is unrelated to this fix. Ideally I'd define a similar enum for videoStandardInput, but I don't know the values yet.
  - Interestingly, brian-mk has also mentioned adding enums at https://github.com/ramapcsx2/gbs-control/issues/389#issuecomment-1321164339.
- I changed the code to echo back every command you send it. I found this useful for figuring out the code that each GUI button triggers, and makes reviewing the logs after the fact more useful, since they tell you which commands you sent in what order, and at what time relative to periodic/delayed events (like sync watcher and frame time lock), input changes, and messages from the code responding to user commands.
- I moved the existing frame sync interrupt handlers (and associated state) to namespace MeasurePeriod, for organization and to prepare for adding new interrupt modes like "measure frame rates and latency in background". I think this change is not harmful, but it's not a clear improvement since I didn't (yet?) implement a new "non-blocking interrupt-driven measurement" mode in a separate namespace like I had originally planned.
[x] I have not looked into findBestHTotal and applyBestHTotal yet. I hope this will not produce a vicious cycle of GBS-C continuously increasing htotal to slow down the output frame rate, and my changes continuously increasing the video clock generator frequency to compensate. And after debugging the FrameSyncManager::reset() issue, I do not have the energy to learn more code.
I noticed that the GBS-C's output at 480p is much narrower horizontally than a PC running at 640x480 DMT, or the same GBS-C runing at Pass Through output resolution hooked up to a 480p Wii (almost as wide as PC but offset horizontally). I suspect this is because the GBS-C sets its htotal value much higher relative to the horizontal active time, than the PC standard or the video input. Can this be changed to behave like Pass Through or PC DMT? Increasing the image width through the Picture Control tab/section results in the sides being corrupted.
When I enter the Developer tab and click "HTotal--" repeatedly (from an initial value of 2345?), the image slowly gets wider. This also calls FrameSyncManager::reset(), which in my initial implementation set maybeFreqExt_per_videoFps = -1. This caused frame sync to fail, printing Error: trying to tune external clock frequency while clock frequency uninitialized!. (If I press HTotal-- enough times, GBS-C triggers an automatic HTotal resync and prints HTotal Adjust: 29 in the logs. I'm not sure what codepath triggers this, since I don't have a debugger and didn't go digging.)
- [x] Do you use a debugger? Should I use the Arduino 1.0/2.0 IDE or VS Code for debugging, or something else?
- I also got the maybeFreqExt_per_videoFps < 0 error after manually clicking "Resync HTotal".
- The error goes away when I set 480p output resolution again.
The same "clock frequency uninitialized" error happened after power-cycling my Wii. When a video signal reappeared at the same hsync rate as before, this also called FrameSyncManager::reset() but not externalClockGenSyncInOutRate() to initialize maybeFreqExt_per_videoFps.
I "fixed" this by making FrameSyncManager::reset() not clear maybeFreqExt_per_videoFps. I don't understand the whole state machine, but hopefully this is safe.
- Now messing with HTotal-- and HTotal++ causes odd things to happen to input latency, where it never converges to near-zero 0.000x even after I press "Resync HTotal" (because we never call externalClockGenSyncInOutRate on the new output FPS). Picking an output resolution exits this odd state.
- I still want to ensure that FrameSyncManager would never encounter a mismatched output resolution (I assume switching presets/resolutions always switches rto->freqExtClockGen) and maybeFreqExt_per_videoFps. To do this, I made externalClockGenResetClock() (which sets rto->freqExtClockGen when switching presets) call FrameSync::clearFrequency() to clear maybeFreqExt_per_videoFps, so FrameSync::runFrequency() wouldn't operate with the wrong conversion ratio for a different resolution. (externalClockGenResetClock() doesn't currently measure the output frequency produced by a given clock rate, and I didn't add it since the oscillator and TV5725 may not be stable immediately after programming a frequency.)
  - Now I get a single "clock frequency uninitialized" error when powering my console up, for a brief moment before someone (no clue who, I don't have a debugger set up) calls externalClockGenSyncInOutRate() -> FrameSync::initFrequency() which sets maybeFreqExt_per_videoFps. I'm fine with this single error, since it goes away (and frame sync begins) when the console remains powered on.
  - Ideally, in external clock generator mode, you wouldn't even call FrameSync::runFrequency() before externalClockGenSyncInOutRate(). I didn't try changing the code to make this happen.
I do not currently have any handling for "freak measurements". I found that if I use Priiloader to boot straight into the HBC, then exit to the Wii Menu (Health and Safety Screen), there will be a sync phase interruption. Even with "FrameTime Lock" turned off, this usually results in a single . (sometimes 2 \n .) printed to the console. What do numbers, periods, and asterisks mean after the input disappears?
[x] With my custom "FrameTime Lock" turned on, when exiting the HBC, occasionally (in one attempt out of many) I've seen the input FPS be incorrectly estimated as 77.335258, causing the video clock to be set anomalously high and my VGA CRT monitor to lose sync after the Health and Safety Screen appears. I'm guessing my multisync monitor synced to the GBS-C outputting 77 Hz and showed the screen, then when the GBS-C returned to 60 Hz 100 frames later, the monitor went blank before showing the screen again. Should I fix this in this PR, or make a follow-up one (since this PR mostly improves behavior with frametime lock turned on, and users can turn it off if they're experiencing problems)?
- How should I handle this case? Should I save input FPS measurements, and only adjust the value by up to a factor of ±0.01 per sync iteration or 0.05-0.1 total (clamping or ignoring out-of-range values), and let the GBS-C sync watcher(?) handle resetting the output mode when the input switches between 50/60 Hz for a prolonged time?
  - Where should the cached input FPS be stored? Does the source code already save this somewhere? Should we instead derive valid FPS ranges from the input video mode?
  - When should we clear the old FPS measurement or keep it? I'm not interested in repeating the FrameSyncManager::reset() debugging adventure without help. Hopefully since we can always rebuild the cached FPS after a clear, it's safe to erase it in both reset() and cleanup(), whenever switching between 15 kHz, 31 kHz, or no signal, even if we don't call externalClockGenSyncInOutRate afterwards.
- Should I take the median of 3 input FPS measurements (with or without rejecting values far from the expected frequency) to adjust output sync? Or if we already have a valid FPS measurement, should we instead take the median of the previous FPS and two measurements?
  - Or should I take two measurements, throw both away and take two more if they disagree by more than 1 FPS, and otherwise average them?
- Perhaps if sync gets interrupted to the point the GBS-C outputs numbers/dots/asterisks, we should set delayLock = 0? I'm not sure how to do that, and it may fail to recognize sync interruptions that don't trigger numbers/dots/asterisks but are large enough to tamper with FPS measurements, but I still think it's worthwhile as "defense in depth" (though you have to turn it off when you test FPS measurement rather than playing games).
If I want to take the median of multiple sequential vblank interval measurements, I'd have to rewrite the interrupt code to record the timestamp of multiple sequential interrupts into an array.
- [x] Is taking multiple sequential vblank measurements a good idea since it's faster? Or will that increase the chance of correlated error, since a single erroneous vblank (from noise?) midway through a frame will produce two incorrectly low frame time measurements? Should I instead take multiple independent vblank interval measurements, like the time from 1-2, then 3-4 (and sleeping from 2-3 without measuring its time), using the current interrupt code?
Additionally, taking the median of recent frame durations, on top of alternating between input and output frame time measurements, may be tricky to implement in a background interrupt handler. Perhaps you could keep a ring buffer of the last 3 measurements (and possibly the last output), and compute and output the median within the interrupt handler (is this too slow?).
- [x] When writing interrupt-based code, is it safe for loop() to write to a non-volatile global variable before calling attachInterrupt(fn), and fn to read from those non-volatile global variables once invoked by an interrupt?
[x] How likely is it that adjusting the external clock generator within a factor of 0.001 from the frequency derived from the input FPS, will result in wildly incorrect timings (externalClockGenResetClock(), see delay(1))? I've never seen it happen, and externalClockGenSyncInOutRate makes no attempt to prevent this from happening.
[x] Will adjusting the external clock generator by a factor of ±0.001 cause monitors to lose sync or change on-screen geometry? My VGA CRT does not lose sync, and I keep thinking I see the on-screen size change, but when I look more closely I can never find it happening at the same time as frequency adjustments.
If runFrequency() repeatedly fails until syncLockFailIgnore reaches 0, we call FrameSync::reset(). This currently does not reset maybeFreqExt_per_videoFps. This should be fine? But it won't bring the GBS-C out of a state where maybeFreqExt_per_videoFps is persistently unset by mistake.
I added my own FRAMESYNC_DEBUG before noticing FS_DEBUG. I don't know the best course of action yet. And Ideally I'd turn off this define before this code is merged (it's useful for debugging, but clutters the logs with constant messages if you're trying to find other messages like resolution changes).
When I check "Disable External Clock Generator", my code still uses it for latency adjustment until I click "Restart". (I also have to "Restart" after unchecking Disable, to enable the clock generator.) Is this a pre-existing bug?

Not bugs, but room for improvement

This does not set latency to a fixed value during start-up. I've seen latencies as high as 0.25 to 0.699 frames above the target, although these are brought down under control in a matter of seconds (and most players are interested in predictable latency during gameplay, not startup menus).
Should we use a lower FrameSyncAttrs::syncTargetPhase (video latency)? Should we use a lower video latency when using external clock generator, than with the built-in clock? (I'm not sure if my approach handles 240p/480i transitions better than internal clock and adjusting vblank duration, though it works better in steady-state operation)
- Ideally users could pick a runtime-configurable target phase (latency in fraction of a frame), and set it lower if they play well-behaved games, and higher when playing games that switch between 240p and 480i (most notably some PS1 games). This would require some refactoring to move phase from a template parameter to a state variable...
- I've measured up to 0.052 frames of latency offset when switching between 240p and 480i in 240p Test Suite Wii, then waiting for the sync watcher to switch the video output rate and the frame sync to print the resulting change in phase/latency. So setting the latency target to 0.1 frames should be safe even for PS1 games.
- Oddly sometimes the frame sync notices the FPS change before the sync watcher does. The resulting behavior appears harmless?
Should we frame-sync more often (reduce FrameSyncAttrs::lockInterval) (though this blocks the main loop more often)? Should we wait less time after 240p/480i changes before starting frame sync (to stabilize the input latency sooner)? Currently we skip two frames, most likely due to delayLock.
- I do not have the motivation to implement non-blocking frame sync, and want to get this PR (with minimal change from existing code) polished for release first.
The current code has a separation between framesync.h with FrameSyncManager and vsyncPeriodAndPhase running off periods, and gbs-control.ino with externalClockGenSyncInOutRate and getSourceFieldRate/getOutputFrameRate running off frequencies. My current code straddles the two domains, and I feel the architectural separation makes my changes (and arguably the existing code) harder to understand.
- For example I reference SerialM, rto, and Si in framesync.h even though they're defined in gbs-control.ino, so clangd treats these variables as undeclared identifiers. And I had to move #include "framesync.h" in the main file to below SerialMirror SerialM;.
- Ideally I'd either merge framesync.h into the main file, or move global declarations to a #pragma once header and have each file include all headers it uses. And I may or may not try to unify the various methods of getting frame rates.
I'm scared of how stateful the code is, and how easy it is to perform duplicated work, or (even worse) forget to initialize or update variables. To help find these bugs, I included multiple checks for invalid state, which print an error message. During testing, I've seen multiple cases where FrameSyncManager::runFrequency was called after FrameSyncManager::reset() cleared FrameSyncManager::maybeFreqExt_per_videoFps, before calling FrameSyncManager::initFrequency() to initialize it again, and had to think of ways (given my fragmentary understanding of GBS-C's state machine) to preserve or initialize this state when needed, and clear it when invalid.
- In projects I maintain myself, my current strategy for keeping track is tracking all mutations in an instance of a "StateTransaction" type (example), which sets a bitflag for each piece of modified state, then recomputes all derived state in the non-noexcept destructor (you could also use an explicit commit method to avoid throwing from a destructor).
  - This approach was built primarily around user input, though it also handles periodic events like audio playback (code). I'm not sure how well this will translate to the GBS-C where you're primarily responding to changes in input video signal, and counting valid/invalid frames before taking action.
- Another improvement is using std::optional<T> and std::variant to add "not present" states, and force the interior type T to be "always valid" as long as you check that the optional is not nullopt. Then design the program (upfront or through incremental refactoring) around this simplified, more tractable, state management.

(i've spent long enough poring over my code and reading/revising this post over and over...)

Fixes #286.

commit history: simple: I don't care. At all. The history is a great mess already, how much worse can it get? ;p

findBestHTotal: This comes up in a few variations. Basically, I don't think this is meant to run (or do changes) with the clock gen installed. If it does, then it was for side effects or to keep custom settings in presets.

Do you use a debugger: Nope, feel free to add support, if you want :)

freak measurements: Comes up a in some variations. Basically, if you measure 77Hz on the source, and it's just a 480p Wii, that's wrong :p It needs to be ignored, another measurement scheduled, maybe the sync situation isn't stable..

when to take VSync period measurements: Experiment and see what you get :)

small clock gen adjustments: The idea is to try and keep the attached display happy, by indeed changing the frequency in very small steps and with some wait time. Whether it works or not depends a lot on the display though. If the display doesn't like it, it will drop sync, go to black screen and resync.

volatile: These are not ideal, but you're guaranteed to read or write atomically on this architecture (no read while it's getting written). Volatiles do not prevent the compiler from reordering or otherwise breaking the intent though. Proper atomic variables would solve it (but are a bit tricky to use). Basically, I just assumed they work fine, until proven otherwise.

Broadly though, I want to encourage you to try your new code with various sources, and if it seems to work, that's good enough for me. It's going to be an improvement!

https://github.com/ramapcsx2/gbs-control/issues/286#issuecomment-1407641143

If you can run a few more sources (SNES and PSX for example cover a broad range), the better.

I'm going to test my PR on SNES later today. I don't have a PSX, but running 240p/480i transitions in 240p Test Suite GX showed (EDIT: in further testing, up to 0.053633 frames, and 0.064307 frames if I time a transition during the beginning of a frame sync measurement) of latency change, which is well under the 0.25 frames of target latency. In fact I think even 0.1 frames of target latency would be (EDIT: marginally) safe even in games switching between these video modes, but I'm not 100% confident there will never be tear lines at that latency.

If fixes and improvements can be broken up into smaller PRs, then that's good, but sometimes everything is interrelated

Ideally I'd remove the compile_commands.json and restore the OLED menu. At that point, I'll have to decide whether to split out my earlier changes ("Rename serialCommand and userCommand", "Remove commented-out code blocking code folding", enum OutputMode, console.log("GBSControl.ws.onmessage()"); (still not sure what I did in ae28a351f7d9668ba9164b54f824ecd84dc3740d)) into a separate PR.

freak measurements: Comes up a in some variations. Basically, if you measure 77Hz on the source, and it's just a 480p Wii, that's wrong :p

I'm thinking that running the frame lock every 6 frames instead of 100, in a local debug build, will make the incorrect FPS measurements much more reproducible (much higher chance a bad input frame will coincide with measuring input FPS). (Though I should also power off my monitor so I don't strain it with nonstandard refresh rates.)

[x] Afterwards I'd test 15/31 kHz transitions and sync phase discontinuities.

Another idea is taking pairs of measurements in a loop until I get two similar vblank duration results (fail after 1-3 non-matching measurements to avoid looping indefinitely), then taking the median of the previous measurement and the two new ones (or if there's no cached measurement, take the first/second/average measurement).

small clock gen adjustments: The idea is to try and keep the attached display happy, by indeed changing the frequency in very small steps and with some wait time. Whether it works or not depends a lot on the display though. If the display doesn't like it, it will drop sync, go to black screen and resync.

My CRT display doesn't lose sync with 0.1% changes in pixel clock. Perhaps I can release the current code as-is, and if anyone complains of losing sync, I'll consider reducing the maximum frequency deviation then.

Testing SNES and sync glitches

SNES RGB only reaches ±0.002 frame of latency even after setting the output resolution to 480p a second time. This is substantially worse than the ±0.0005 frames I've seen on Wii, but still an order of magnitude less error than the maximum I'd tolerate (0.02 frames or so). I wonder if this is because the SNES's frame rate drifts more than Wii? Or does my code's rounding errors respond less favorably to 262 than 263 line 240p?

Perhaps if sync gets interrupted to the point the GBS-C outputs numbers/dots/asterisks, we should set delayLock = 0? I'm not sure how to do that, and it may fail to recognize sync interruptions that don't trigger numbers/dots/asterisks but are large enough to tamper with FPS measurements

Momentarily tapping my SNES's Reset button usually (not 100% of the time) causes 2 or . to appear on the debug console.

With lockInterval set to 6 frames rather than 100 (to perform frame locking nonstop), resetting my SNES reliably (more often than not) causes my code to misdetect input frame rate, with intervals as low as 25-30 FPS or as high as 100s-1000s of FPS. This turned out to be extremely valuable in developing a robust system for surviving sync glitches.

Fixing sync glitches

I settled on taking two samples at a time, and rejecting them if either measurement fails, or the measurements are too far apart (either 0.5 FPS or a relative difference of 0.5/60, picked somewhat arbitrarily). I decided against implementing a "median of two measurements and old value" system, since it offers no benefit (the two incorrect measurements would override the correct old input FPS value) but adds code complexity and state management failure modes.

Unfortunately, resetting my SNES still sometimes produced two matching incorrect measurements, which is treated as a new input FPS. (I had previously clamped the maximum deviation from newly measured input FPS, so when the input FPS changed, the output FPS would swing by massive amounts.) To avoid issues here, I instead clamp the maximum output FPS change to ±0.001 ratio (±0.1%) from the previous output FPS.

To keep the system observable (so I know when sync glitches are seen and caught by my FPS watcher code), I made my code print a "FPS excursion detected!" message whenever the detected input FPS passed my validity checks, but deviated by 1 or more FPS from the previous output FPS (I don't have the previous input FPS, but previous output FPS should be close enough).

I think that with these two checks for bad input frequencies, we could remove delayLock altogether (in the clock generator path at least) and begin controlling latency 200 frames (over 3 seconds) earlier (if the input signal is already stable when runFrequency() is first called). Though removing existing safety measures is risky, so I will probably skip this change for now.
- Speeding up initial latency correction would be unnecessary if I learned the code which initializes the video processor, and changed it to sleep until the right phase before initializing. But that's more learning and work to be done than I'm planning in this PR. Plus you're locking onto phase on the console's startup video output, which is precisely the part of the signal that delayLock was built to skip, because the signal may not be stable.
  - I do think attempting to get the right startup phase by sleeping is better than doing nothing and usually getting the wrong phase; there's no downside to sleeping more/less (aside from blocking for longer), whereas adjusting the frequency of a video signal can create tearing or make the monitor lose sync.

At this point, I performed testing using my SNES and Wii:

Switching between 15 kHz and 31 kHz, in verbose mode (FRAMESYNC_DEBUG), to check for latency and switching errors.
- Works. runFrequency() sometimes sees inconsistent or incorrect frame rates, but they don't affect the output rate enough to cause sync errors.
Switching between 15 kHz and 31 kHz, in fast mode (reduce lockInterval) to check for switching errors.
- Same as above. Interestingly GBS-C sometimes momentarily misdetects 480p as PAL 576i, resulting in a scrambled image for some time before it settles in 480p.
Switching between 240p and 480i in verbose mode.
- up to 0.053633 frames of latency offset, and ~~0.062746~~ 0.064307 frames if I switch video modes during the beginning of GBS-C performing frame sync.
  - I think (not 100% sure) that detecting line count changes faster, or running the latency watcher more often, will reduce the amount of latency offset when switching between 240p and 480i. This would allow us to push the target latency lower (under 1/8 frame) without risking tearing. Though I'm not sure how to implement either change.
Reset SNES to cause sync glitches and phase shift, in verbose mode.
- No issues, output latency converges to desired amount.
Reset SNES to cause sync glitches and phase shift, in fast mode to check for sync errors.
- Works. runFrequency() sometimes sees inconsistent or incorrect frame rates, but they don't affect the output rate enough to cause sync errors.
Steady state operation in fast mode
- Oddly I sometimes see spurious(?) failures to measure input (and possibly output?) refresh rate, even when the source signal coming from the SNES (and Wii?) is stable:

runFrequency(): vsyncPeriodAndPhase failed, retrying...
(later)
runFrequency(): getPulseTicks failed, retrying...

I haven't dug into what's going wrong, and it's worth debugging. But these don't affect sync quality, since the measurement retries work fine. Even if a retry fails occur and a frame lock iteration is skipped (which I haven't seen in limited testing), this generally isn't enough to cause latency to drift below zero and cause tearing (unless multiple consecutive frame locks fail on both their measurement and retry, which is exceedingly unlikely).

[ ] If a frame lock iteration fails to measure input frequency, should we instead reset/interpolate the output frequency to the most recent valid input frequency, or initial externalClockGenSyncInOutRate() frequency, or a flat 50/60 Hz? Or should we not use the most recent input frequency, because it may be an already-invalid measurement?
- I don't have the energy to implement any "reset output frequency if we failed to measure input or latency" mechanism.

After performing this testing, I noticed that externalClockGenSyncInOutRate() ignores frame rates outside of the range [47, 86] Hz. I decided to implement this in runFrequency() as well. Afterwards, I found that glitchy sync during console resets and 15/31 kHz video mode changes would almost never make it through all checks and set the output frame rate incorrrectly. (In fact, in my brief testing I've never gotten "FPS excursion detected!" to show up, indicating there were no changes in validated input FPS greater than 1 Hz.)

Further ideas for improving handling of sync glitches

An alternative approach is to constrain changes to measured input FPS, not just output FPS. I didn't try this, since constraining output FPS has a similar-enough effect, doesn't require storing extra state, and doesn't come with state management/invalidation problems.
A simpler approach is to discard input FPS measurements altogether (only keep phase), and rely on externalClockGenSyncInOutRate() to save the input FPS when a new or changed signal is found. This might handle sync changes better (be even less prone to detecting incorrect input rates, and allow removing the complex multiple-measurement system I added), but may have less stable latency in steady-state operation (if the initial externalClockGenSyncInOutRate() call is inaccurate or the input frequency drifts).
- This idea may not be workable. externalClockGenSyncInOutRate() is not called when the input disappears and then reappears under the same video mode. I think that if you eg. switched between two consoles both outputting 480i by replugging cables or using a component switchbox, the input FPS would never be updated even if the consoles have different video crystals and different frequencies.
  - To verify, I unplugged my Wii (outputting 480i component) from my GBS-C, then plugged in my GameCube (outputting 480i composite), and did not see externalClockGenSyncInOutRate being called, even though the GC and Wii frame/scanline frequencies were offset from each other.
  - To test the performance of this approach, I hacked up a change which saved externalClockGenSyncInOutRate#sfr into FrameSyncManager, reconstituted periodInput approximately from sfr, and skipped computing fpsInput altogether. The phase error was similar to before (0.000x frame) before swapping from Wii to GameCube, and much greater (plateauing at 0.0047x and very slowly decreasing afterwards) afterwards.
  - Is this good enough? Probably. Is it better than my current code? I don't think so.
- Alternatively you could recalculate the input frame rate when performing frame sync after FrameSync::reset() is called. But you might save an incorrect frequency upfront, and never correct for it afterwards. And if you use my current multi-sampling FPS check to reject incorrect values, the code with input FPS hard-coded on startup ends up looking about as complex as my current adaptive code! Just with more latency drift, but less(???) chance for incorrect output frequency on console reset.

Rebase

I force-pushed a cleaner master branch (no longer disabling OLED) to my fork, created PR #406, and rebased this branch on the new master. I hope my changes in master or this PR didn't break the OLED menu I had previously disabled.

Since you squash-merged #406, I rebased and force-pushed on master. There are no code changes.

maybe I should add a comment OutputBypass = 10, // Pass Through in a subsequent PR? Not sure.

If you're going to squash-merge this PR too, I think it's worth adding this in a separate PR (or you push a commit yourself, to avoid one-liner low-utility PRs) to avoid cluttering the squashed diff. Though if you disagree, I can add the comment here, or drop it entirely (it's not absolutely critical to add).

I've been ~~playing Wind Waker~~ testing steady-state operation, and have found no tear lines so far (though admittedly it's harder to see tear lines on 30fps games), and all prior testing suggests extremely stable video latency (though I don't have my "line in recording" latency measurement rig set up, and disabled debug printouts, so I can't verify this is the case in my current play session).

Will the docs need to be updated to say that frame time lock is now useful with clock generator installed?

I've sent you an email :)

Is it possible to eventually implement output-frequency-based latency control, without an external clock generator? I don't know. Even after looking over the register/programming PDFs, I'm not sure what modes the TV5725 has available, to drive the input ADC clock or output DAC clock together or separately, with or without a clock generator, and whether you can fine-tune the output frequency without a clock generator to adjust latency.

I had hopes that PLL_R/S and PLLAD_R/S would help, but it seems your code already sets those flags, and they control input sampling rate per line rather than output frame rate? Not sure.

EDIT: During testing, I rebooted the GBS-C in 1280x960 mode, then an unknown amount of time later exited from 240p Test Suite (240p) to Homebrew Channel (480p) (or possibly exited earlier but it only applied after the reboot finished). This apparently crashed the ESP (the web UI displayed a red disconnected sign permanently, until I reset the board), and left the video scaler outputting 480p YPbPr (nearly-all green in a VGA monitor) offset to the left.

IMG_20230201_035009

Restart
user command a at settings source 1, custom slot 65, status 4
source Hz: 59.82494 new out: 59.82494 clock: 161973344 (-26656)

.
2345678
Format change: 13 <stable>
clock gen reset: 161973568
ADC offset: R:44 G:44 B:42
clock gen reset: 161973568
(hang)

I have no clue if it's related to this PR, or restart sequence, or power delivery, etc. I was unable to reproduce this bug, and didn't know how to debug the crash (especially since I didn't have the ESP connected to my PC via USB).

https://arduino-esp8266.readthedocs.io/en/latest/faq/a02-my-esp-crashes.html

I've been able to get a temporary green corrupted screen (but no ESP crash) by switching to 480p right after the ESP's LED turns on after a reboot, or exiting from 240p Test Suite half a second after rebooting and waiting for the video output to go blank.

The Wii does that sometimes, iirc. It doesn't like the 240p test suite, is all i remember.

Decided to do further testing with a Dell U2312HM LCD monitor, as a stand-in for LCD monitors/televisions.

I connected my GBS-C's inputs to a Wii (via component) and SNES (via RGB), set my GBS-C to output VGA to my monitor at 1080p, then started experimenting with various video modes and signal interruptions. I tested powering on my Wii to HBC in 480p mode (startup random latency/phase), exiting from HBC to Wii Menu (480p randomizing input phase after console power-on), and powering-on and resetting my SNES console (240p startup random phase, and randomizing phase mid-signal).

Results

It appears that with the GBS-C to LCD signal set to 1080p, the LCD monitor is especially sensitive to changes in input frequency. Anything causing a large increase (haven't seen a decrease work) in GBS-C output frequency will cause the screen to go black for ~1.5 seconds before resyncing. Afterwards, the monitor apparently "re-centers" to the newer frequency, and can freely switch between the old (low) and new (high) frequencies without losing sync. The monitor loses sync again if you increase the frequency further past the high frequency without power-cycling, or if you switch to the low frequency and power-cycle the monitor, then raise the frequency again.

With the GBS-C in frequency-based frame-time lock mode (this PR), I've been able to trigger sync loss by sometimes turning on the SNES or Wii, sometimes resetting the SNES or exiting the Wii to System Menu (in frame-time lock), and usually (always?) switching Wii 240p Test Suite between 240p and 480i. (I'm unsure if entering/exiting 240p Test Suite causes a frequency change, as the GBS-C itself loses input sync when the Wii switches between 15 and 31 kHz signals.)

Even with frame-time lock disabled, switching the Wii from 240p (263 lines, 59.82 Hz) to 480i (262.5 lines, 59.94 Hz) is always enough to cause my monitor to lose sync. (Oddly switching the other way around, reducing frequency, hasn't yet caused my monitor to lose sync.)

Sidenote: I initially suspected the monitor lost sync because I changed video output frequency too quickly. externalClockGenSyncInOutRate included code to gradually ramp the output rate up or down by 1 kHz at a time, and I initially thought 240p-480i transitions did not lose sync. However, removing the frequency ramp did not cause 240p-480i transitions to lose sync, but instead power-cycling the monitor did. Additionally, porting the frequency ramp to FrameSync::runFrequency did not fix my monitor losing sync. Even after I switched FrameSync::runFrequency to only change output frequency by 0.05% at a time, my monitor still lost sync, but after two 0.05% changes rather than one 0.1% change.

I've been able to get my monitor to lose sync twice in a row without power-cycling in between, by hard-resetting the GBS-C to create input latency (with External Clock Generator on but FrameTime Lock off), power-cycling my monitor while showing 240p Test Suite (59.82 Hz), switching my Wii to output 480i (59.94 Hz, so externalClockGenSyncInOutRate raises the output frequency and breaks sync), then enabling FrameTime Lock (raising the frequency even further to 60 Hz temporarily and breaking sync again). Oddly, after my monitor locks onto sync at 60 Hz, it doesn't lose sync when switching back down to 240p (59.825 Hz input rate, and as low as 59.813 output rate for one iteration to increase latency towards the target).

I decided to check how my monitor behaved at other VGA frequencies. When testing 240p-480i transitions, it seems my monitor can tolerate both input frequency increases and decreases when receiving VGA 480p, and can tolerate neither frequency increases nor decreases at 720p or 960p. I'm not sure how other LCD monitors behave. (My CRT has no problems with frequency changes, at any resolution I've tested it with.)

It would be very bad if my LCD would lose video sync once when my code boosted the frequency to compensate for latency, then a second time when my code reduced the frequency back to match the input frequency. It hasn't happened... yet.

(Unrelated observation: 240p Test Suite displaying a wholly scrambled image, that persisted across output resolution changes... perhaps because I had switched resolutions with no input signal, and GBS-C "helpfully" disabled the sync watcher and enabled debug mode, and I hadn't known to turn the sync watcher back on before switching from 480p to 240p. Can I disable this code branch so it doesn't bite me and likely others?)

Next steps

Not much we can do about the monitor losing sync. Workarounds:

[ ] We wouldn't lose sync on console power-on, if we matched the output VGA phase to the input phase when beginning to output a signal. But I don't know how to implement this.
We would still lose sync on console reset if the input video phase changes, since we have to either adjust the output frequency, or interrupt the output signal just like the input.
[x] Should we match input latency sooner (don't wait for delayLock in frequency-based sync) after 15/31 kHz transitions, so the output frequency is adjusted immediately (and sync lost immediately) rather than 3+ seconds in? (Displaying video, then losing video multiple seconds later, is quite unexpected behavior.)
- [ ] Should externalClockGenSyncInOutRate call FrameSync::runFrequency right away, further reducing latency by a fractional period?
  - Not worth it IMO, there's some real funky logic around run FrameTimeLock if enabled for deciding whether to run FrameSync::runFrequency or not.
We cannot avoid changing output frequency upon 15/31 kHz transitions, since the input frequency is fundamentally different. However, we currently lose sync half a second after the input mode changes, delaying a stable video half a second further than necessary (and incurring half a second of frequency mismatch and latency drift, which must be corrected for).
- [ ] To prevent this, we should change output frequency immediately when detecting a line count change. Is this safe to do, and won't trigger false positives? IDK.

I'm going to keep the frequency ramp, because hopefully it prevents the TV5725 from displaying glitched output. In the code without a frequency ramp, one time I saw half a scanline of corrupted output, which coincided with a log message describing changing the frequency by 0.1% instantaneously.

[x] Should I revert to 0.1% maximum frequency change at a time? 0.05% is debatably more gradual, but takes around 0.8 seconds longer to lock onto a stable latency, and could theoretically result in latency overshoots if we change frequency too much and too infrequently (but we don't change frequency too far and too infrequently, so we don't overshoot).
[x] Should we change the maximum total frequency offset from 0.1% to 0.05%? This should prevent my monitor from losing sync, but slows down the GBS-C correcting for large input frequency deviations by 2x, so it can take 20+ seconds to go from 0.95 frames of input latency to ~0.3 frames (with a target of 0.25).
- I've decided this is the only acceptable approach to appease finnicky LCD monitors, though I'm still unhappy with the slow latency control. 0.075% seems to work fine at 720p and 960p on my monitor. Maybe I'll go with 0.06% as a "good enough" compromise.

Oh, I think you've found a particularly sensitive display then. This is the thing: Nearly every other monitor / TV / collectively "sink" will behave differently. This was annoying to no end when I initially wrote this and dialed in settings..

Priority for where sync drops should be avoided is on 240p/480i switches, as that is a common occurrence when playing games on some systems. The PSX has this bad, so it's a really good test console (has the 240p test suite, too :)).

Next down on the priority list is what happens when the console resets its GPU (PSX again great here), which would happen on a console reset, for example. I don't think all sync losses can be avoided here, but sometimes, a monitor will endure this fine, just showing a black picture with a few glitches for a bit.

And then furthest down is obvious drastic source timing changes. Sync can be allowed to drop then.

You are asking for what to prefer in some sync loss unavoidable situations: I think you're on the right track there. If sync will be lost, then it would be best if we get a stable + retimed output back as soon as possible. It will be better to re-time everything at once, provided that the code first determined the new source timings are stable. The PSX is again a good example, where a PAL console will start the GPU in NTSC mode (!) on boot, then switch to PAL a few millis later. A CRT TV will not get too upset by this, but every scaler I know struggles with this a lot, even a RetroTink5X hates this.

Frequency ramping works well on many displays, and I would recommend to stay close to the ramp steps / timings that I had. They worked across my test gear selection.

Priority for where sync drops should be avoided is on 240p/480i switches, as that is a common occurrence when playing games on some systems. The PSX has this bad, so it's a really good test console (has the 240p test suite, too :)).

When the input frame rate changes, the output frame rate must change. I think there's flat out no way to make my LCD on 720p/960p accept 240p/480i transitions, outside of possibly hacks like transmitting a video signal halfway between the 240p and 480i refresh rates, waiting for the monitor to sync and hoping you don't overflow 0-1 frames of latency, then afterwards transmitting the real refresh rate.

I think you're on the right track there. If sync will be lost, then it would be best if we get a stable + retimed output back as soon as possible. It will be better to re-time everything at once, provided that the code first determined the new source timings are stable.

The current code is not perfect (probably takes longer than ideal to respond to console resets, due to lastVsyncLock), but since I restricted it to a ±0.06% output frequency swing, it does not cause my monitor to lose sync at all. Additionally it takes longer than ideal to respond to 240p/480i switches, but I consider that out of scope for this PR, since I did not write or change that code in this PR (and don't even understand it).

Frequency ramping works well on many displays, and I would recommend to stay close to the ramp steps / timings that I had.
They worked across my test gear selection.

Are you talking about externalClockGenSyncInOutRate adjusting the clock generator by 1 kHz at a time? If you think the current formula doesn't need changing, I'll keep it as-is.

Testing old frame time lock

Are there any TVs or monitors that can't even accept VGA vsync+hsync rate changes as small as 0.06%? (I suspect this is very rare, since you say I have "a particularly sensitive display" and even it can handle 0.075% without problems.)

If so, it may be worthwhile to turn off the external clock generator if you want to use frame time lock (bypassing this PR entirely). Afterwards, frame time lock will change vtotal rather than the pixel clock and hsync rate, and some monitors tolerate that better. For example, with internal-only frame time lock, my LCD doesn't shift the image vertically at all in Method 1, and doesn't lose sync when the frame rate changes, even in the ever-picky 960p resolution.

I'd keep external clock generator on for displays like my CRT, which tolerates frequency changes but offsets the image upon line count changes (in both FrameTime Lock methods!). Frequency-based frametime lock also has the benefit of converging to a perfectly stable latency, rather than vtotal-based frametime lock causing latency to jitter by significant fractions of a frame in steady-state operation (I think, see below), so I might even use it on my LCD.

I haven't actually inspected what the old vtotal-based "FrameTime Lock" does using an audio interface (I want an oscilloscope but don't have one). Does it alter the output video signal by 2 scanlines, for a full lockInterval at a time?
- In 480p output mode, in my observation it only enables correction (turns off LED) for 1 iteration at a time, then waits a long time before enabling correction again. I suspect slowing down the output rate by 2/525 for a whole 100 frames drastically increases latency (by 200/525 or 0.38 of a frame?!), resulting in a sawtooth latency over time. If accurate, I think this kind of latency variability is quite bad, and bordering on perceptible to twitchy/fighting game players.
  - Ideally I'd decrease lockInterval drastically. But that causes the ESP to spend more time blocked and less time ready to respond to user commands or sync changes. And I don't want to further change code that's already outputting a dubiously legal video signal, since I don't have many portable displays I can hook up to my consoles and GBS-C to test for compatibility.
- On higher output resolutions, adding 2 scanlines per frame is a proportionally lower relative FPS change, and corrections are applied/unapplied more frequently.
It's useful to uncomment #define FS_DEBUG_LED, which causes vtotal-based frame time lock to toggle the LCD when it increases or decreases vtotal. This way, you can correlate vertical image shifts with state changes (if the GBS-C tells you it switched vtotal counts and the image doesn't shift, you know the current "Method: 0/1" works with your monitor).

Priority for where sync drops should be avoided is on 240p/480i switches

Unfortunately, with "FrameTime Lock" enabled and external clock generator disabled, my monitor still loses sync (at 960p) just under a second after the GBS-C input switches between 240p and 480i! This is both the most important case of input frame rate change, and one that "FrameTime Lock" does not fix (and perhaps cannot easily fix).

This also occurs on master rather than this PR, so it's not caused by my changes, though it's still interesting (and disappointing) to note.
I found that around 0.8 seconds after a 240p/480i switch, runSyncWatcher() hits the timingAdjustDelay == 0 branch and call FrameSync::reset, than 0.1 seconds later the display loses sync. Commenting out FrameSync::reset() prevented the screen from going black, but this is not acceptable; after switching from 263 line 240p to 262.5 line 480i, I get a tear line going up the screen (newer image below) every 6 seconds.
- reset() is labeled sets syncLockReady = false, which in turn starts a new findBestHtotal run in loop().
- Apparently FrameSync::reset() sets FrameSync::syncLockReady = FrameSync::ready() to false, causing the main loop to call runAutoBestHTotal() -> FrameSync::init() -> findBestHTotal(), and change the output frame rate through applyBestHTotal.
  - I should look further into applyBestHTotal, and see if it provides an easy way to make 480p wider and closer to a PC image or passthrough.
Could we adapt to input interlacing changes without changing output horizontal frequency? We would have to pick an appropriate horizontal frequency, then change (currently increase only, can we safely decrease too?) output line count by at least a similar amount (1/525) as the input. At 480p output, FrameSync already changes output line count by 2/525, but I'm less confident at 1080p output (2/1125?).

LCD gameplay play-testing

I spent a few days gaming on my Wii outputting 480p component, plugged into a GBS-C upscaling to 960p with clock generator and frame sync enabled, hooked up to my LCD.

I found 480p/480i transitions (specifically launching GameCube games) to be quite slow, but I'm not sure if this was better on CRT or prior to changes. I'm not too concerned about this, or interested in fixing this.
Gameplay worked very well, and I have not seen any tearing even at syncTargetPhase = 36. I don't know if my LCD adds latency when upscaling 960p to 1080p (this is not a GBS-C problem).
- Ideally on a 1080p LCD I'd send 1080p from the GBS-C to the LCD. However, in 1080p the GBS-C squashes all games into the center 3/4 of each scanline, making it impossible to display widescreen-compatible games in a widescreen LCD.
- [ ] Is it possible to add a "widescreen 1080p" resolution mode? Or better yet, control whether to pillarbox the output in a separate toggle, independent of the output resolution? (Though I fear that this is difficult to implement due to the TV5725's esoteric video timing register system, which I still don't understand.)

For the later part of my testing, I tried decreasing video latency: I set syncTargetPhase to 36 (/360 = 0.1 frame) and reflashed my GBS-C. I am not making this change in this PR, as it requires further testing in 240p/480i transitions, and further changes to be production-ready:

Reduce syncTargetPhase in frequency mode only. (Decreasing target latency in the older vblank mode would change behavior, and possibly even create tear lines.)
Ideally speed up lockInterval in older vblank mode only (to shrink large latency spikes, which I observed even in master), then test the code to ensure it doesn't take up too much time.
Ideally add a user-selectable target latency, which you can decrease if you don't play PS1 games.

Low-latency measurements

I then plugged the GBS-C back into my CRT, and checked the resulting latency using a solar panel and audio interface (on Wii Home screen because I'm lazy).

Data at 960p output 0.1 frame target.aup3.zip.

Placing the solar panel on top of the screen, and comparing the beginning of the first scanline, I measured 329 samples of latency, or 1.71 ms. (329 smp / 192000 smp/s * 1000 ms/s)
Placing the solar panel on the bottom of the screen, and comparing the end of the last scanline, I measured 461 samples of latency, or 2.4 ms.

Results:

The latency is higher at the bottom of the screen, I think because the output takes longer to scan out each frame (DMT 960p active / 1000 total) than the input takes to receive it (DMT/CEA-861 480p active / 525 total).
I don't know why even in steady-state, the frame start latency is substantially more than 1.666 ms. I didn't check if 480p output would fix this. In any case I don't care enough to investigate, and the latency is good enough for my purposes.

Conclusion

At this point I'd say this code is reliable and well-tested (though I haven't had other people verify on their devices), and I'm not opposed to merging at this point.

It works in steady-state operation and console startup on my CRT and LCD (although the 0.06% frequency change limit is a compromise, and slows down startup latency convergence to avoid losing sync on LCD).
It does not break old vtotal frame sync (method 1 on my LCD still works, does not lose sync, and does not shift the image up and down).
My LCD still loses sync at 960p (and 720p?) on 240p/480i transitions. It also did on master (before this PR), both with external clock generator enabled, and with clock generator off and frame sync on.
- I think with a LCD as picky as mine, you cannot avoid losing sync when the input field rate changes, unless you either run a decoupled output frame rate (has tearing), or output a signal which changes vtotal in sync with the input's vtotal (not implemented).

There remain code-level nitpicks; for example I shouldn't have added separate FS_DEBUG and FRAMESYNC_DEBUG (I think these can be unified, since FS_DEBUG mostly affects vtotal sync and FRAMESYNC_DEBUG mostly affects frequency sync). But I don't particularly care, since they don't affect users, don't block the critical path of maintenance, and I think they're resolvable easily.

[x] My preference is to "merge commit" since this branch is already rebased on top of master. @ramapcsx2 should I make a merge commit to preserve history, or squash merge for consistency?

I can say that the 1080p output presets are at the limit of the chip, and indeed compromised. I couldn't get a nice aspect ratio while keeping sharp scaling, so whatever is there now is some form of compromise, not a preferred choice :p

Latency sounds good. Basically all latencies below 16.6ms / one frame are a great result, and it doesn't matter whether it's closer to 4ms or closer to 2ms. Latency only becomes an issue, once it adds to perceivable lag, which would be in the several frames range.

If you want to preserve more of this work's history, a merge commit is fine. If it doesn't matter much to you, squash is fine :)

ramapcsx2 / gbs-control