jonathanhogg / flitter

A functional programming language and declarative system for describing 2D and 3D visuals
https://flitter.readthedocs.io
BSD 2-Clause "Simplified" License
34 stars 1 forks source link

OpenGL render bug on Apple Silicon #23

Closed jonathanhogg closed 6 months ago

jonathanhogg commented 8 months ago

See https://github.com/jonathanhogg/flitter/issues/22#issuecomment-1808168411 for details. Reported by @mdales.

282468203-1c967591-db8a-4ebb-a20e-eb869bec4ae6
jonathanhogg commented 8 months ago

@mdales: if you get time, could you run it with --debug (or --trace) and attach the output? Ta.

mdales commented 8 months ago
flitter --fullscreen --debug examples/physics.fl                                                          13:57 
13:57:47.453 9906:.engine.control  | INFO: Added code page 0: examples/physics.fl
13:57:47.453 9906:.engine.control  | INFO: Switched to page 0: examples/physics.fl
13:57:47.454 9906:asyncio.selector_events | DEBUG: Using selector: KqueueSelector
13:57:47.563 9906:.language.tree   | WARNING: Partial-evaluation error: Unbound name 'RECORD'
13:57:47.568 9906:.cache           | DEBUG: Read program: examples/physics.fl
13:57:47.569 9906:.cache           | DEBUG: Compiled to 26096 instructions in 15.5/4.9/5.5ms
13:57:47.569 9906:.engine.control  | SUCCESS: Loaded page 0: examples/physics.fl
13:57:47.572 9906:.cache           | DEBUG: Read program: examples/bloom.fl
13:57:47.572 9906:.cache           | DEBUG: Compiled to 10 instructions in 3.2/0.0/0.0ms
13:57:47.574 9906:.cache           | DEBUG: Read text: examples/glsl/blur.frag
13:57:47.574 9906:.cache           | DEBUG: Read text: examples/glsl/color_adjust.frag
13:57:47.574 9906:.engine.control  | INFO: Start counter, tempo 120.0, quantum 4
13:57:47.636 9906:.render.physics  | DEBUG: New physics :bubble with 501 particles and 502 forces
/Users/michael/Dev/flitter/venv/lib/python3.11/site-packages/glfw/__init__.py:916: GLFWError: (65540) b'Invalid window size 0x0'
  warnings.warn(message, GLFWError)
13:57:47.885 9906:.render.window   | DEBUG: window opened on screen 0
13:57:47.885 9906:.render.window   | DEBUG: OpenGL info: Apple M1 Pro 4.1 Metal - 86
13:57:47.886 9906:.render.window   | DEBUG: window resized to 3360x1890 (viewport 3780x3780 x=1470 y=0)
13:57:47.898 9906:trimesh.interfaces.blender | DEBUG: searching for blender in: /Users/michael/Dev/flitter/venv/bin:/Users/michael/.opam/default/bin:/Users/michael/.docker/bin:/bin:/sbin:/usr/bin:/usr/sbin:/opt/homebrew/bin:/usr/local/sbin:/usr/local/go/bin:/Users/michael/go/bin:/Users/michael/.cargo/bin:/usr/local/bin:/Applications/Visual Studio Code.app/Contents/Resources/app/bin:/Applications/blender.app/Contents/MacOS:/Applications/Blender.app/Contents/MacOS:/Applications/Blender/blender.app/Contents/MacOS
13:57:47.899 9906:trimesh.caching  | DEBUG: falling back to hashlib hashing: `pip install xxhash`for 50x faster cache checks
13:57:48.097 9906:.render.window.canvas3d | DEBUG: Created canvas3d 2160x2160/16-bit render target with 4x sampling
13:57:48.098 9906:.render.window.canvas3d | DEBUG: Compiling standard lighting shader for 50 max lights
13:57:48.102 9906:.render.window.models | DEBUG: Preparing model !sphere/3
13:57:48.102 9906:trimesh.primitives | DEBUG: creating mesh for Sphere primitive
UNSUPPORTED (log once): POSSIBLE ISSUE: unit 0 GLD_TEXTURE_INDEX_2D is unloadable and bound to sampler type (Float) - using zero texture because texture unloadable
13:57:48.146 9906:.render.window   | DEBUG: shader#contrast GL program compiled in 1.8ms
13:57:48.148 9906:.render.window   | DEBUG: shader#blur GL program compiled in 1.1ms
13:57:48.149 9906:.render.window   | DEBUG: shader#blur GL program compiled in 0.9ms
13:57:48.151 9906:.render.window   | DEBUG: shader#lighten GL program compiled in 0.7ms
13:57:48.152 9906:.render.window   | DEBUG: window GL program compiled in 0.6ms
13:57:53.280 9906:.engine.control  | INFO: 58.2fps;  1.4/ 6.3/ 0.8ms (run/render/sys); perf 1.20

(note to do this I cleared all my debugging prints and such - this should be the same code as head of main)

jonathanhogg commented 8 months ago

Nothing obvious there unfortunately. I'm gonna upgrade to Sonoma tonight and see if I can reproduce or if it's an Apple Silicon GPU driver thing.

mdales commented 8 months ago

FWIW, this is what it looks like also under Windows 11 using WSL. To the untrained eye it looks the same as I get on Sonoma?

Screenshot 2023-11-14 143136

Otherwise, I guess a general response to your comment at BarCamp - it works just fine under WSL AFAICT. It even can handle Windowed mode better than macOS Sonoma :)

jonathanhogg commented 8 months ago

I am busy upgrading all the things to see if I can reproduce…

jonathanhogg commented 8 months ago

Well, of course, I've upgraded everything and it looks just fine on my Mac 🙄

So I'm guessing it must be a driver issue rather than something in Sonoma or Windows. The Radeon drivers I use must have subtly different OpenGL behaviour.

Can you try switching to colorbits=8 on the !window node and see if that makes a difference? I'm wondering if there are odd values in the floating-point colour texture and the Apple Silicon / WSL drivers are treating them differently than the Radeon ones do.

[Oops! Edited to have correct node kind.]

mdales commented 8 months ago

As requested, here's a video of it with colorbits=8:

https://github.com/jonathanhogg/flitter/assets/28506/70192980-07d8-4d93-8fde-be12fbfb2608

jonathanhogg commented 8 months ago

Still knackered then.

How about with this change:

diff --git a/examples/physics.fl b/examples/physics.fl
index 1a75ba1..2e570aa 100644
--- a/examples/physics.fl
+++ b/examples/physics.fl
@@ -31,8 +31,8 @@ let bubble_radii=MIN_BUBBLE_RADIUS+(1+noise(:radii, i*NOISE_SCALE, NOISE_T))/2*(
     !record filename=('blobs.m4v' if RECORD and CYCLE > 1) codec=:hevc crf=20 limit=30
         @bloom_filter size=SIZE/2 radius=hypot(SIZE)/300
             !canvas3d viewpoint=0;0;-1200 focus=0 near=1 far=2000 fov=60/360 samples=4
-                !light color=2;0;0
-                !light color=-1.9;0;0 direction=0;0;1
+                -- !light color=2;0;0
+                !light color=1;0;0 direction=0;0;1
                 !sphere subdivisions=3 color=0.01 size=RADIUS position=$(:bubble;:centre) transparency=0.05
                 for i in ..NBUBBLES
                     !sphere subdivisions=3 position=$(:bubble;i) color=bubble_brightness[i] size=bubble_radii[i]

Which turns off the ambient lighting and changes the directed lighting to be a simple (positive) red light.

mdales commented 8 months ago

This is with colourbit=8 still, as with 16 I still got the more harsh outer shell on the big sphere, but do say if you want to see that one.

https://github.com/jonathanhogg/flitter/assets/28506/6fa7079c-784b-45fb-99c1-7654ca6a5819

jonathanhogg commented 8 months ago

Right. I'm trying to make sense visually of what's going on here. It looks like each sphere has a thick black outline. This includes the semi-transparent sphere that the others are bubbling within – I think that's the main reason it looks like two completely different spheres. The bloom filter on top looks to be working fine – it is smearing the resulting sharp edge.

I can't see a mechanism by which OpenGL would draw outlines like this accidentally (drawing outlines is hard even deliberately), so my gut feeling is that what is actually happening is that this is the inside edge of each sphere being shaded black by the fragment shader. The only obvious way that this could be happening to me is that the bug is related to the surface normal and the calculation of the dot product of this and the light direction.

On my Mac, I see a smooth gradient at the edge of each sphere, whereas in the video above it seems to cut hard from bright red to black.

I'd be curious what it looks like if we simplify the example down a bit. Perhaps just to:

let SIZE=1080;1080

!window size=SIZE colorbits=16
    !canvas
        !rect size=SIZE
        !fill color=0;0;1
    !canvas3d viewpoint=0;0;-1200 focus=0 near=1 far=2000 fov=60/360 samples=4
        !light color=1 direction=0;0;1
        !sphere subdivisions=3 color=1;0;0 size=400

which on my Mac looks like this:

Screenshot 2023-11-16 at 09 51 55
mdales commented 8 months ago

Wow, yeah, that's quite different :)

Screenshot 2023-11-16 at 10 47 44
jonathanhogg commented 8 months ago

Yeah, that's not working. It's either the calculation of the normal or the calculation of the diffuse colour strength. The latter is just a simple dot product and I can't imagine that being a problem, so I would guess it's the former which involves a more complex matrix inversion.

Here's a slightly more sophisticated example to test:


let SIZE=1080;1080

!window size=SIZE colorbits=16
    !canvas
        !rect size=SIZE
        !fill color=0.5
    !canvas3d viewpoint=0;0;-1200 focus=0 near=1 far=2000 fov=60/360 samples=4
        !light color=1;0;0 direction=0;0;1
        !light color=0;1;0 direction=1;0;0
        !light color=0;0;1
        !sphere subdivisions=3 color=1 size=400

It should look like this:

Screenshot 2023-11-16 at 12 26 22
mdales commented 8 months ago

And our survey says...

Screenshot 2023-11-16 at 13 05 19
jonathanhogg commented 8 months ago

Wow. That is extremely broken in ways that suggest it is NaNs or infinities that are getting into the mix rather than zeroes.

jonathanhogg commented 8 months ago

Let's try and reason this out.

There are three light sources in this example:

The blue ambient light should be present across the entire surface of the sphere, so the fact that we have black sections suggests that there is a numerical error leading to the color variable in the shader being corrupted with an infinity or a NaN.

The section of the sphere that is lit looks to be correct, so the calculations aren't completely wrong.

The black border on the left looks to be an interaction with the calculation of the intensity of the red surface reflection, which is the dot product of the surface normal and the light direction. The border measures at roughly 78% out from the centre, which is a cosine of $0.63$. So values of the dot product below this are broken. I can't think of anything special about this number.

The fact that the entire left side is dark suggests that the green light is the source of the error there, and that entire side should have a negative cosine, which should be being clamped to 0 in the shader. However, if cosines less than $0.63$ are at fault, then the boundary should not be in the middle here.

?

jonathanhogg commented 6 months ago

So, exciting times. Gary just came round with his new Apple Silicon MBP and I've been able to do some more testing. The bug is somewhere in the Phong lighting shader and clearly some difference between how the Apple Silicon and AMD OpenGL drivers are dealing with a numerical edge-case.

Anyways, I am in the process of switching to a new PBR lighting model and have a work-in-progress implementation in pull request #27. The great news is that this new model renders perfectly on Apple Silicon (or, to be precise, is no more broken than I already know it is).