libretro / RetroArch

Cross-platform, sophisticated frontend for the libretro API. Licensed GPLv3.
http://www.libretro.com
GNU General Public License v3.0
10.04k stars 1.81k forks source link

[Feature Request] (BFIv3) Emulate a CRT Electron Beam Via Rolling-Scan BFI #10757

Open mdrejhon opened 4 years ago

mdrejhon commented 4 years ago

New Methods of Emulating CRT Via Sheer Hz

Also reduces motion blur without needing a strobe backlight like LightBoost or ULMB

I'm the founder of Blur Busters and creator of TestUFO. Today, we now have 240Hz 1ms IPS panels, and DELL is releasing a 360Hz IPS monitor this summer. This is an amazing opportunity for emulation.

Glossary BFI = Black Frame Insertion, used to reduce motion blur on LCDs to mimic a CRT BFIv1 = Classic 60fps at 120Hz BFI, already implemented in RetroArch BFIv2 = Improvements to BFI for higher Hz, See GitHub #10754 BFIv3 = Rolling-bar BFI, emulating CRT electron beam temporally, FOCUS OF THIS GITHUB ITEM

Optional but recommended related items: Add retro_set_raster_poll #10758 if you need to do beamracing simultaneously with a CRT electron beam emulator. And add precision frame-presenter thread #11390 if you need to improve precision of framebuffer presenting-to-screen.

BFIv2 versus BFIv3, and Potential $500-$700 Source Code Bounty (reuse of existing bounty from 2018)

While BFIv2 is easier to implement. BFIv3 is a long-term Holy Grail feature for the refresh rate race to retina refresh rates -- using sheer brute Hz to emulate a CRT electron beamtemporally via a rolling-bar emulation.

If you beam-race this feature, then this potentially qualifies for the already-posted BountySource bounty (read for more info).

Why Emulate a CRT Electron Beam On a High-Hz Displays?

We need to better mimic CRTs in software. Thankfully, brute refresh rates makes this possible. The arrival of 240Hz IPS (ASUS) and 360Hz IPS (DELL AW2521H) this summer, now it is time to start talking about emulating a CRT beam at the sub-refresh-cycle level.

The long-term futurist vision is that a 1000Hz display will be able to emulate a CRT electron gun, by displaying a "rolling bar" with a phosphor fade trail. Instead, we aim to do that in software. 240Hz and 360Hz is now high enough refresh rates to begin implementing this.

ASUS has confirmed a long-term road map to 1000Hz displays (those unaware, can also read more; Blur Busters Law: The Amazing Journey To Future 1000Hz Displays, with scientific citations included), and with the push of 120Hz becoming more mainstream (iPhone/Galaxy), high-Hz is expected to be inexpensive inclusions in future panels in ever-increasing numbers with refresh rates doubling approximately 5-10 years. While crazy numbers, the impetus of ultra-Hz is currently esports and VR, with the need for screens to emulate real life (real life doesn't flicker, and eliminating motion blur without hardware strobe, requires insanely high refresh rates).

This item may, depending on parameters, be able to qualify for the $500-$700 BountySource bounty posted for #6984, because the techniques in this github item may be easier to implement on a cross-platform basis, while allowing simpler modifications to make this github item functionally identical to #6984! (This doesn't necessarily have to happen, but one can hit two birds with one stone).

Inspired by a forum thread on ArcadeControls as well as MAME Temporal HLSL, I'm crossposting the request, with some modifications.

Rolling-Bar Black Frame Insertion Concept

It can even achieve sub-refresh latencies using ordinary VSYNC ON. One wouldn't have to care about how a display refreshes. For example, at 240Hz, you'd display 1/4th of an emulator refresh cycle (rasterplotted in real time), in 4 separate output frames.

For a 60Hz emulator module onto a 240Hz monitor

The rest of the frame would be black (except for any required alphablend overlaps to eliminate seams/artifacts)

Make top/bottom edges of bars fuzzy to reduce/eliminate tearing artifacts (avoid emulating the look of VSYNC OFF tearing). I have confirmed that you have to overlap the bars between output refresh cycles. Use alpha-blend-to-black slightly beyond the 1/4-height frames for the 240Hz situation example. This will prevent tearing artifacts. Make sure to gamma-correct the alphablend overlaps. Remember that RGB(128,128,128) is not exactly half the photons of RGB(255,255,255). Use a configurable gamma correction number in HLSL config file.

It can also acomplish beam raced latency without black frames too

The rolling scan BFI could in theory be configurable to full persistence. It would achieve beam raced latencies to the output refresh cycle granularity.

For a 60Hz emulator module onto a 240Hz monitor

Make sure to alphablend the seams (blur the refresh overlap line) to prevent tearing artifacts, otherwise it looks like "a VSYNC OFF emulation in VSYNC ON". The alphablend fixes the tearing.

This allows beam-raced latencies via sheer brute Hz. So, this module could be configurable to have full persistence or rolling low persistence (rolling black period).

This will scale very well to the future, in the refresh rate race to retina refresh rates too. A future 1000Hz display would emulate original machine latency to an error margin of 1ms, duplicating sub-refresh original-machine latencies, regardless of display technology (scan direction, display refresh pattern, etc). Most high-Hz gaming displays have a sub-refresh processing delay, so the higher Hz you go, the more it converges into original-machine latency.

Very little crossplatform dependencies, the only thing needed is VSYNC ON and the ability to do framerate=Hz.

Emulator Hz and destination Hz doesn't need to be divisible.

Situation Example of 60Hz CRT emulation onto a 200Hz LCD ... This formula is very Hz-agnostic.

Emulator Refresh Cycle 1: ....Real Refresh 1: full 60/200th height bar (30% screen height), at 0%-30% vertical position ....Real Refresh 2: full 60/200th height bar (30% screen height), at 30%-60% vertical position ....Real Refresh 3: fuill 60/200th height bar (30% screen height), at 60%-90% vertical position ....Real Refresh 4: 1/3 of 60/200th height bar (10% screen height), at 90%-100% vertical position

Emulator Refresh Cycle 2: ....Real Refresh 5: 2/3 of 60/200th height bar (20% screen height), at 0%-20% vertical position ....Real Refresh 6: full 60/200th height bar (30% screen height), at 20%-50% vertical position ....Real Refresh 7: full 60/200th height bar (30% screen height), at 50%-80% vertical position ....Real Refresh 8: 2/3 of of 60/200th height bar (20% screen height), at 80%-100% vertical position

Emulator Refresh Cycle 3: ....Real Refresh 9: 1/3 of 60/200th height bar (10% screen height), at 0%-10% vertical position ....Real Refresh 10: full 60/200th height bar (30% screen height), at 10%-40% vertical position ....Real Refresh 11: full 60/200th height bar (30% screen height), at 40%-70% vertical position ....Real Refresh 12: full 60/200th height bar (30% screen height), at 70%-100% vertical position

This is just a conceptual example of temporal compensation. Algorithmically, this can be used for black frame insertion (rolling bar, some bars with image data, other bars black, with alphablended bleed overlap) or for "beam racing via brute Hz" (all bars with image data, new emu Hz overwriting old emu Hz, alphablend/blur the seams) or both simultaneously (rolling BFI + beam racing simultaneously). You could adjust the alphablend factor up/down.

Theoretically can also acomplish stutterless standards conversion (50Hz onto 60Hz)

Using a SIMILAR formula above, combined with a huge overlap alphablend adjustment (i.e. 50% or 75% screen height of overlap, perhaps). Basically a scanout-enhanced version of a common alphablend standards-conversion algorithm. It would thus, eliminate stutters. The taller the alphablend overlap, the less stutter of standards-conversion. The bar-overlap should be a configurable parameter in the configuration file.

This would conceptually be a more advanced version of the software-based variable refresh animation: www.testufo.com/vrr ....where I'm able to play any framerate stutterlessly onto any refreshrate, with a very simple alphablend algorithm not too dissimilar from the common 50/60 standards conversion alphablend algorithm.

So we're just "abusing" the alphablend overlap feature of this theoretical "Temporal HLSL" as the method of destutter in much the same stutter-eliminating way.

Apparently, @TomHarte already tested something similar (some kind of scanout-alphablend algorithm) in his experiments with CLK.

Temporal HLSL concept is a universal CRT emulator with amazing crossplatform flexibility

It scale up and down universally

OPTIONAL: Theoretically compatible with hardware beamraced VSYNC

See #6984 for Beam Raced VSYNC

Thus, it would produce latencies identical to the Lagless VSYNC / Beamraced VSYNC approaches, like the one already implemented in WinUAE https://github.com/tonioni/WinUAE/issues/133

Certainly, actual-hardware beam racing would be 100% optional (since it requires minor platform-dependant and display-scanout-direction knowledge)

I only simply add this section to say that Temporal HLSL is 100% compatible with hardware-based beam racing simply by internally virtualizing a higher-Hz display internally, e.g. doing 600 frames per second of Temporal HLSL into RAM, and then doing 600-frameslice beamracing onto a real 60Hz display, subject to jitter margin to hide problems caused by CRT-curvature algorithms as well as computer performance jitter.

Software-based beam racing: Output Hz massively above Emulator Hz. The art of emulating a CRT electron beam via brute Hz. Hardware-based beam racing: Output Hz same as Emulator Hz. Beam raced synchronization of rasterplotting emulator's raster ahead of real display's raster. Using beam raced VSYNC OFF frameslices, that would work off a Temporal HLSL framebuffer too. (An approach successfully already implemented in a few emulators)

Basically the Temporal HLSL concept is compatible with both software-based beam racing and hardware-based beam racing. Then it would produce identical latencies to #6984

mdrejhon commented 4 years ago

Task Breakdown Simplification:

GitHub item #10758 is a pre-requisite for this.

I broke out the retro_set_raster_poll pre-requisite separately, because it's a universal requirement for all possible beamraceable output techniques.

mdrejhon commented 4 years ago

Frequently Asked Questions

Question: How Much Display Motion Blur Can I Reduce?

Answer 1: Answer For Sample-And-Hold Displays like LCDs without using Blur Reduction Mode

Software BFI also reduces motion blur without a strobe backlight, up to these following numbers:

120Hz refresh rate: 50% motion blur reduction for software BFI for emulator 60Hz 180Hz refresh rate: 66% motion blur reduction for software BFI for emulator 60Hz 240Hz refresh rate: 75% motion blur reduction for software BFI for emulator 60Hz 300Hz refresh rate: 80% motion blur reduction for software BFI for emulator 60Hz 360Hz refresh rate: 83% motion blur reduction for software BFI for emulator 60Hz 480Hz refresh rate: 88% motion blur reduction for software BFI for emulator 60Hz 1000Hz refresh rate: 94% motion blur reduction for software BFI on emulator 60Hz

Successfully demo'd in TestUFO (when run on 240Hz+ monitors), the motion blur amount will be (Source-Hz) divided by (Destination-Hz) -- persistence refresh cycle visibility time. The higher Hz, the easier to emulate shorter-persistence phoshpor.

Display motion blur is directly proportional to pixel visibility time, as seen in www.testufo.com/eyetracking -- We're simply using brute Hz for ON-then-OFF to simulate the briefness of phosphor. The higher Hz, the briefer we can make pixels flash. This is true regardless for BFIv2 (#10754) or BFIv3 approach (this github)

Answer 2: For impulsed displays such as LightBoost / ULMB / ELMB / DyAc / VRB etc displays

Strobe backlights are the ones found in Motion Blur Reduction FAQ.

If a display has a strobe backlight mode, see #10754 for how software BFI simultaneously helps hardware strobing.

While software BFI combined with a hardware strobe backlight -- will usually work better in a global mode (e.g. using software BFI to convert a 120Hz strobe to a 60Hz strobe by blacking out every other strobe). There are many useful reasons that software strobe is combined with hardware strobe.

The anticipation is that over the long term, sheer refresh rate will make strobe backlights unnecessary, since 1000fps at 1000Hz creates a blurless sample-and-hold display. However, it also provides very fine-granularity opportunity of emulating CRT phosphor.

Question: Does Rolling Scan Flicker Less Than Global Strobing?

Answer: Yes.

A rolling BFI keeps average number of photons hitting human eyeballs consistent (because the global flashing strobe of LightBoost / ULMB / etc). So rolling BFI is more eye-friendly than global-flash BFI (only 60 global flashes per second, complete blackness in between).

A CRT tube is usually always emitting light somewhere on the tube surface -- because of the scanning phosphor dot. That softens the harshness of flicker a little, and explains why 60Hz global-strobe looks more flickery than 60Hz scanning-strobe.

Question: Could Rolling BFI (v3) And Global BFI (v2) Be Handled In This Same Algorithm?

Answer: Theoretically, Yes.

See #10754 for more information about global non-rolling BFI.

BFIv3 (rolling BFI) could be programmed in a way to also be configurable to behave like BFIv2 (global BFI in #10754).

That way, a user can choose to configure rolling BFI or global BFI. Then, BFIv3 would still be able to continue handling 60Hz BFI onto 120Hz displays as it used to. As well as other BFI cadences that benefit 180Hz, 240Hz, 300Hz, 360Hz, etc, as listed in #10754

Question: Won't Picture Get Dark?

Answer: Yes but HDR is Making Future Displays Brighter

By mid 2020s, it is expected that cheap thousand-elment FALD LCDs (1000-nit HDR) will fall well below $1000 for desktop monitors.

The extra brightness headroom of HDR is redirectable into compensating for brightness loss during black frame insertion. A 240Hz 1000-nit display can still software rolling-BFI at 250-nits for example.

There was a 10,000 nit Sony prototype display at an earlier CES that I saw with my eyes. It was damn impressive seeing realistic sun glints on a chromed 1957 Chevy (or similar classic car) that was brighter-than-white. And shockingly realistic neon signs in night scene. Sheer dynamic range for HDR.

HDR is something that will benefit CRT electron gun emulation!

mdrejhon commented 3 years ago

UPDATE: There is an item that may make BFIv3 easier/more flexible.

[Feature Request] Futureproof RetroArch with precision frame pacing presenter thread

mdrejhon commented 3 years ago

Interesting Equivalence: Conceptual Emulation of a High Speed Video of a CRT Tube

One conceptual way to more easily understand this github item:

A high speed video of a CRT tube. You see a rolling bar in those, with blurry edges. This is seen in many YouTube -- you see phosphor trailing behind.

Did you know that playback of a 1000fps high-speed video of a CRT tube -- in real time onto a true 1000Hz display -- allows a 1000Hz display to perceptually emulate the original CRT?

The "BFIv3" or "Temporal HLSL" concept, aims to emulate that behavior in software. For best emulation perfection requires retina resolution (spatial HLSL filter) + retina refresh rate (temporal HLSL) + retina HDR (to keep it beam-emulation bright).

A display that combines all of this will take some time to arrive, but -- this encompasses the venn diagram of capturing the look-feel of a CRT tube (at flat tubes).

(plus a slight amount of edge-alphablending for overlaps to prevent tearing artifacts from appearing, especially for the sharp bottom edge of the rolling-scan bar as seen in high speed videos.)

mdrejhon commented 1 year ago

UPDATE ON PRACTICALITY

We have heard announcements of 240 Hz OLEDs such as the new 240 Hz OLED laptops, and we've heard of that Corsair 240 Hz OLED announcement too. This is FANTASTIC for a CRT electron beam simulator;

240Hz OLEDs actually make CRT electron beam simulators pretty workable today.

I coded an early "crude" CRT simulator in JavaScript for an internal experiment (for a future TestUFO animation). It doesn't work well on current-Hz LCDs, unless you're at ~360Hz+

However... OLED works MUCH better!

An early custom internal "TestUFO CRT Simulator" experiment (to be released in 2023) shows that you only need 240Hz on an OLED in order for CRT electron beam simulators to become superior to classical monolithic 1:1 BFI in several key criteria, at least when it comes to instant-pixel-response displays such as OLED.

Good strobing reduces motion blur more at 240Hz (need 1000Hz for software-based CRT electron beam simulators to be as low-blur as hardware strobing) -- but the big problem is 60 Hz flicker is more painful (than original 60Hz CRT) when the 60Hz flicker is squarewave flicker. The 60Hz flicker is a LOT less painful with a CRT electron beam simulator -- closer to how an original CRT eyestrained you. Since the rolling-scan means photons are hitting eyeballs at all times, and the simulated phosphor decay softens the flicker.

You get a fair bit of light loss, but you also get that with ordinary BFI and strobing too.

A minor artifact does show up -- that is only visible during ultrafast motion speeds if you only give a software-based CRT electron beam simulator too few Hz (such as 240): Fast motion speeds still show slight seams from the adjacent-refresh-cycle-effect low-granularity refresh but they are much less noticeable than traditional VSYNC OFF tearing artifacts due to the blurred/alphablended overlaps between adjacent refresh cycles of simulated CRT electron beam timeslices. The more Hz you simulate a CRT beam with, the less visible this artifact becomes. Especially if you're also simulating a low-Hz with era-based phosphor dot bloom and low-DPI shadowmasks/aperturegrille (ala HLSL filters), which greatly masks this.

This adds an extra adjustment that may be needed at least for entry-level refresh rates (240Hz+) used for CRT electron beam simulators: "Adjust amount of refresh cycle blending overlap" to mask this artifact. This will reduce persistence reductions slightly, as a tradeoff. Where 0% would represent a sharp VSYNC OFF tearline and 100% would be completely invisible but has ~2x more motion blur (1/120sec motion blur during 240Hz CRT beam simulator, 1/500sec motion blur during 1000Hz CRT simulator). Low settings like 2% or 5% would show up as a blurry tearline (e.g. 10 to 50 pixels tall), while 100% is full screen heighted, or at least double the height of the rolling-scan window, to shingle-overlap adjacent refresh cycles with a temporal alphablend-brightness per pixel (to eliminate the bright/dark banding side effects) -- make sure that same-color pixels are emitting the same number of photons regardless of how many of the photons is emitted in the previous refresh cycle for that said pixel, and how many photons is emitted in the next refresh cycle for that said pixel.

This does raise the complexity of CRT simulators because of a necessary low-Hz defacto temporal-antialiasing-style algorithm to eliminate tearlines of the rolling scan (funny we call "240Hz" as a low Hz for a CRT simulator). But it works, provided you gamma-correct the alphablending.

Gamma correction of temporal alphablending is needed because RGB(127,127,127) is not exaxctly half the number of photons as RGB(254,254,254). So we need to include a gamma correction argument, with a default of 2.2, for the temporal alphablender logic in the CRT electron beam simulator.

So +2 more variables recommended for CRT simulator software:

Nontheless, the motion blur reduction in some fastscroll games (e.g. Sonic Hedgehog) outweighs this significantly.

So beginning 2023, superior-than-BFI electron beam simulators is already looking good! Provisionally minimum requirements for it to be superior to monolithic BFI is a 240Hz (or more) OLED, or a 360Hz (or more) LCD. This is close to what I predicted, as long as the tearing-blurring logic is added to compensate for limited Hz.

One neat thing is that it's absolutely fantastic to see a rolling CRT simulator in slow-motion. So 60Hz developers can run the emulator at 1/4 or 1/6 speed and watch a CRT flicker in slow-motion -- much like a slow-motion video of a CRT tube -- except it's the CRT electron beam simulator software in action!

Even better if algorithm also accommodate the fact that rise time (illuminate) is much faster than fall time (phosphor fade). This is very hard if your Hz is limited to 240 and you can only emulate a CRT beam in 1/240sec (4.16ms) granularity, but will be more important as Hz goes up.


Potential Trailblazing Software

LCD simulator software already exists A Blur Busters forum member already wrote an open-source LCD simulator (LCD GtG simulation) called Blurinator 9000, in this forum thread https://forums.blurbusters.com/viewtopic.php?f=7&t=8569&p=68257 ... It has provided uncanny resemblance to a real LCD -- including overdrive overshoot artifacts and strobe crosstalk artifacts correctly simulated in software!

Looking forward to equivalent for CRT simulation What we need will be a algorithm/software just like that one -- except one that simulates a CRT electron beam instead. We're evaluating refresh rate indepeendent algorithms that scale automatically (becomes better) with refresh rate increases.

Performance Optimization Found: Pregenerated Alpha Masks as Speedup

I feel that a generic simplified beam simulator algorithm will help developers greatly, especially one that can be simplified as prerendering during startup (e.g. rolling-bar alpha masks pregenerated at startup, upon settings changes, and upon refresh rate changes) to allow CRT beam simulation without requiring heavy GPU horsepower.

Although it's possible to fly the beam realtime on a GPU shader, this isn't really necessary at 240-1000Hz if you prerender alphamasks (destHz/sourceHz masks are needed), e.g. the 4 rolling positions for simulating 60fps on 240Hz. Then it's just bitmap mathematics between two frames, to do one frame of CRT simulation. Then it's simple bitmap-blending mathematics. Apply alphamasks, add two bitmaps together, and so on.

These pregenerated CRT electron beam simulator alphamask bitmaps can simply be generated upon:

The alphamask can have all corrections built-in into them, including the tearline-blurring algorithm! This brought it all into the performance budget of javascript, allowing me to prepare to release a TestUFO CRT Simulator coming in 2023 (which will warn you if your display refresh rate is not at lest 240).

The alphamask is at native emulator resolution (e.g. 256x240 or whatever), so the alphamask operations are pretty low-GPU-budget.

CRT beam simulators really only need to focus on verticals only, so you only need one alphablend value per pixel row in the prerendered alphamask approach. So you only need 1 value per pixel row per subrefresh step (4 subrefresh steps when doing 60Hz CRT on 240Hz display).

This brings CRT beam simulators within the performance window of an old internal Intel Iris GPU! It turns the operation from a GPU-performance bottleneck into a memory-only performance bottleneck.

Most newer internal GPUs have no problems mathing-together 240 simple low-resolution bitmaps per second during simulating 60Hz CRT on a 240Hz display. Bitmaps only need to be scaled to high resolution only at the final step, for spatial CRT filters.

What will be needed is to open-source the formulas necessary to generate rolling-simulation alphamasks. Nominally the formulas will need to know the variables below:

Required Variables for a Temporal CRT Beam Simulator

Configuration file, system metrics, command line arguments, defaults, etc...

For simplicity, formula may require hardware refresh rate to be divisible by simulated refresh rate, if you're using the pre-generated alphamask technique to improve performance. But this is actually not critical if you use the realtime GPU-based flying-spot CRT beam simulator.

The latter is a Holy Grail (realtime horizontal & vertical simulated movement of a CRT beam) but not critical, especially at under 1000Hz. So the prerendered alphamask may very well become the predominant method of initial CRT electron beam simulators coming in the 2020s.

Spatial filters can be applied to the final output, to allow the CRT texture feel. I find that the temporals need to be executed before the spatials. No change is needed to spatial filters, except that the exact same spatial filter needs to be applied to every single subframe (the 4 subframes of a 60Hz-on-240Hz CRT simulation).