linebender / vello

A GPU compute-centric 2D renderer.
http://linebender.org/vello/
Apache License 2.0
2.31k stars 133 forks source link

Full system hang on Apple M1 8GB #548

Closed jrmoulton closed 1 month ago

jrmoulton commented 6 months ago

I'm experimenting with swapping in Vello as the renderer for floem and I'm running into an issue where, when using Vello,

I get this issue in the with_winit example in the Vello examples and also in the editor example in floem.

In the with_winit example it will happen when I zoom in too far into the Ghostscript tiger. I have noticed that there is a limit to the amount that I can zoom in and this causes some stuttering but this is separate from when it becomes unresponsive.

In the editor example the issue is caused when I delete several characters from the starting text in the editor.

Prior to hanging macOS activity monitor doesn't indicate high memory pressure.

The issue isn't consistent (and reproducing takes a long time) but it does happen regularly (within 30 seconds of doing the above listed actions).

Apple M1 MacBook Air 8GB RAM Sonoma 14.2.1 (23C71)

bram209 commented 6 months ago

I had the same issue when aggressively zooming in and out. No stuttering, but suddenly the whole system hangs. I am on a M2 Pro 32GB

raphlinus commented 6 months ago

We'll look into it. A good way of isolating this is to turn on --use-cpu, then edit the code to set force_gpu_from to various values. That said, when doing aggressive zooming, it's most likely that there will be a panic from buffer overflow - one of the things to do soon is to chance the CPU shaders to match the behavior of the GPU shaders (report a failure and continue).

armansito commented 6 months ago

I ran into this with some scene content today. I'll investigate.

DJMcNab commented 5 months ago

Hi @jrmoulton. We think that #551 or #553 might have fixed this issue. Does this still happen for you on main?

raphlinus commented 5 months ago

I'm able to repro this fairly straightforwardly on M1. There are two separate issues, both of which need to be fixed.

The first is that zooming in causes unboundedly large memory usage, specifically in flatten. This is because it's flattening all outlines, using the transform to determine the number of subdivisions. What needs to happen is culling the flattened line segments to the viewport. That's also related to vello#542.

The second problem is that when the "line soup" buffer overflows, that should be detected and all downstream work should early-out. This is what #553 was trying to address, but it seems that didn't catch every case. What should be happening is that test in the binning stage should see that bump.lines > config.lines_size then set a bit in failure based on that, then all downstream shaders should stop work.

I did a little validation of this, mostly observing a panic when using --use-cpu. To really track it down, I'll want to use a machine that doesn't hard-lock on failure, probably my AMD on Windows.

waywardmonkeys commented 5 months ago

My M3 Pro Max 64GB doesn’t want to start correctly again after this. Can’t seem to get to logging in successfully.

edit: third time was the charm

raphlinus commented 4 months ago

A couple of updates; I'm digging into this. First, it does not seem to repro on v0.1.0. Second, I had originally suspected #537, but it repros in the parent of that. My current working hypothesis is that flatten itself is getting stuck. I'm starting to do some testing with all downstream shaders ablated, and haven't seen a full hard hang, but have seen the GPU get into a bad state.

If this is the case, then it's likely that there's a reasonably quick patch to just limit the amount of subdivision in flatten. I'm also starting to wonder whether the best approach is going to be aggressive culling to the viewport in flatten; if nothing else, then that will be a major performance improvement in the highly zoomed in case.

It's not clear to me why this would be triggered from floem, I'm wondering whether there's something invalid about the scene. Could you provide more detailed repro steps?

raphlinus commented 4 months ago

I've locally tried applying this patch:

diff --git a/shader/flatten.wgsl b/shader/flatten.wgsl
index 80da188..2126ad3 100644
--- a/shader/flatten.wgsl
+++ b/shader/flatten.wgsl
@@ -434,7 +434,7 @@ fn flatten_euler(
                 let integrand_peak = sqrt(abs(k_peak * (k_peak * dist_scaled + 1.0)));
                 n_frac = integral * integrand_peak / a;
             }
-            let n = max(ceil(n_frac * scale_multiplier), 1.0);
+            let n = clamp(ceil(n_frac * scale_multiplier), 1.0, 100.0);
             for (var i = 0u; i < u32(n); i++) {
                 var lp1: vec2f;
                 if i + 1u == u32(n) && t1 == 1.0 {

Rendering becomes very slow when zoomed in with big factors, but it doesn't cause a full system hang. That's possibly something to try with the floem use case. Another thing to try is turning CPU shaders - I expect it to panic with an out-of-bounds when writing to LineSoup from flatten.

jrmoulton commented 4 months ago

@DJMcNab noticed that in my usage of Vello in Floem I wan't ever doing a scene reset.

After adding a scene reset I no longer experience a hang. I don't know if that is considered resolving the issue but it does unblock me from further integration with Vello

XdaTk commented 2 months ago

The problem still exists.

OS: macOS Monterey 12.6.7 x86_64
Host: MacBook Pro (16-inch, 2019)
Kernel: Darwin 21.6.0
Display (Color LCD): 4096x2560 @ 60Hz (as 2048x1280) [Built-in] *
Display (DELL P2422H): 1080x1920 @ 60Hz [External]
CPU: Intel(R) Core(TM) i7-9750H (12) @ 2.60 GHz
GPU 1: Intel UHD Graphics 630 [Integrated]
GPU 2: AMD Radeon Pro 5300M (0 B / 3.98 GiB, 0%) 
Memory: 11.64 GiB / 16.00 GiB (73%)
Swap: 637.50 MiB / 2.00 GiB (31%)
Locale: zh_CN.UTF-8
DJMcNab commented 2 months ago

@XdaTk can you please provide reproduction steps for what you're seeing?

XdaTk commented 2 months ago

@XdaTk can you please provide reproduction steps for what you're seeing?

git log 

commit 94ce032d53f3ec48d90e7bbbdf739aaae8a40714 (HEAD -> main, origin/main, origin/HEAD)
Author: Daniel McNab <36049421+DJMcNab@users.noreply.github.com>
Date:   Mon Jul 22 10:27:20 2024 +0100

rustc --version
rustc 1.81.0-nightly (5315cbe15 2024-07-11)

cargo run -p with_winit

# Reboot

rustup default stable
rustc 1.79.0 (129f3b996 2024-06-10)
cargo run -p with_winit
cargo build                                                 main
  Downloaded wasm-bindgen-wasm-conventions v0.2.92
  Downloaded wasm-bindgen-shared v0.2.92
  Downloaded leb128 v0.2.5
  Downloaded fallible-iterator v0.2.0
  Downloaded walrus-macro v0.19.0
  Downloaded id-arena v2.2.1
  Downloaded heck v0.3.3
  Downloaded wasm-bindgen-multi-value-xform v0.2.92
  Downloaded wasm-bindgen-wasm-interpreter v0.2.92
  Downloaded base64 v0.21.7
  Downloaded wasm-bindgen-threads-xform v0.2.92
  Downloaded wasm-encoder v0.29.0
  Downloaded miniz_oxide v0.7.3
  Downloaded indexmap v1.9.3
  Downloaded wasmparser v0.80.2
  Downloaded hashbrown v0.12.3
  Downloaded wasm-bindgen-cli-support v0.2.92
  Downloaded cc v1.0.98
  Downloaded serde_json v1.0.117
  Downloaded walrus v0.20.3
  Downloaded gimli v0.26.2
  Downloaded devserver_lib v0.4.2
  Downloaded nv-flip-sys v0.1.1
  Downloaded wasm-bindgen-externref-xform v0.2.92
  Downloaded cargo-run-wasm v0.4.0
  Downloaded nv-flip v0.1.2
  Downloaded 26 crates (2.0 MB) in 1.79s
   Compiling proc-macro2 v1.0.85
   Compiling unicode-ident v1.0.12
   Compiling autocfg v1.3.0
   Compiling libc v0.2.155
   Compiling cfg-if v1.0.0
   Compiling log v0.4.21
   Compiling bitflags v2.5.0
   Compiling arrayvec v0.7.4
   Compiling thiserror v1.0.61
   Compiling smallvec v1.13.2
   Compiling hashbrown v0.14.5
   Compiling anyhow v1.0.86
   Compiling termcolor v1.4.1
   Compiling equivalent v1.0.1
   Compiling cfg_aliases v0.1.1
   Compiling unicode-width v0.1.13
   Compiling bitflags v1.3.2
   Compiling hexf-parse v0.2.1
   Compiling core-foundation-sys v0.8.6
   Compiling rustc-hash v1.1.0
   Compiling unicode-xid v0.2.4
   Compiling parking_lot_core v0.9.10
   Compiling codespan-reporting v0.11.1
   Compiling foreign-types-shared v0.3.1
   Compiling once_cell v1.19.0
   Compiling scopeguard v1.2.0
   Compiling num-traits v0.2.19
   Compiling lock_api v0.4.12
   Compiling bit-vec v0.6.3
   Compiling paste v1.0.15
   Compiling indexmap v1.9.3
   Compiling wgpu-hal v0.21.1
   Compiling indexmap v2.2.6
   Compiling bit-set v0.5.3
   Compiling block v0.1.6
   Compiling raw-window-handle v0.6.2
   Compiling syn v1.0.109
   Compiling wgpu-core v0.21.1
   Compiling wgpu-types v0.20.0
   Compiling libloading v0.8.3
   Compiling hashbrown v0.12.3
   Compiling profiling v1.0.15
   Compiling litrs v0.4.1
   Compiling quote v1.0.36
   Compiling unicode-segmentation v1.11.0
   Compiling wgpu v0.20.1
   Compiling kurbo v0.11.0
   Compiling syn v2.0.66
   Compiling svg_fmt v0.4.3
   Compiling heck v0.3.3
   Compiling leb128 v0.2.5
   Compiling utf8parse v0.2.1
   Compiling objc-sys v0.3.5
   Compiling document-features v0.2.8
   Compiling stable_deref_trait v1.2.0
   Compiling euclid v0.22.10
   Compiling fallible-iterator v0.2.0
   Compiling simd-adler32 v0.3.7
   Compiling anstyle-parse v0.2.4
   Compiling core-foundation v0.9.4
   Compiling malloc_buf v0.0.6
   Compiling objc v0.2.7
   Compiling wasm-encoder v0.29.0
   Compiling gimli v0.26.2
   Compiling static_assertions v1.1.0
   Compiling id-arena v2.2.1
   Compiling parking_lot v0.12.3
   Compiling core-graphics-types v0.1.3
   Compiling peniko v0.1.1
   Compiling is_terminal_polyfill v1.70.0
   Compiling anstyle-query v1.1.0
   Compiling colorchoice v1.0.1
   Compiling wasmparser v0.80.2
   Compiling adler v1.0.2
   Compiling anstyle v1.0.7
   Compiling miniz_oxide v0.7.3
   Compiling guillotiere v0.6.2
   Compiling anstream v0.6.14
   Compiling getrandom v0.2.15
   Compiling crc32fast v1.4.2
   Compiling futures-core v0.3.30
   Compiling memchr v2.7.2
   Compiling objc2-encode v4.0.3
   Compiling flate2 v1.0.30
   Compiling futures-intrusive v0.5.0
   Compiling rand_core v0.6.4
   Compiling fdeflate v0.3.4
   Compiling objc2 v0.5.2
   Compiling regex-syntax v0.8.3
   Compiling strsim v0.11.1
   Compiling aho-corasick v1.1.3
   Compiling crossbeam-utils v0.8.20
   Compiling clap_lex v0.7.0
   Compiling zune-core v0.4.12
   Compiling heck v0.5.0
   Compiling ppv-lite86 v0.2.17
   Compiling clap_builder v4.5.2
   Compiling zune-jpeg v0.4.11
   Compiling png v0.17.13
   Compiling cfg_aliases v0.2.1
   Compiling rustix v0.38.34
   Compiling rand_chacha v0.3.1
   Compiling block2 v0.5.1
   Compiling serde v1.0.203
   Compiling regex-automata v0.4.6
   Compiling dispatch v0.2.0
   Compiling byteorder v1.5.0
   Compiling objc2-foundation v0.2.2
   Compiling rand v0.8.5
   Compiling winit v0.30.3
   Compiling walrus-macro v0.19.0
   Compiling errno v0.3.9
   Compiling tracing-core v0.1.32
   Compiling pin-project-lite v0.2.14
   Compiling pollster v0.3.0
   Compiling serde_json v1.0.117
   Compiling same-file v1.0.6
   Compiling roxmltree v0.20.0
   Compiling wasm-bindgen-shared v0.2.92
   Compiling cc v1.0.98
   Compiling walrus v0.20.3
   Compiling walkdir v2.5.0
   Compiling crossbeam-channel v0.5.13
   Compiling filetime v0.2.23
   Compiling fsevent-sys v4.1.0
   Compiling humantime v2.1.0
   Compiling ryu v1.0.18
   Compiling dpi v0.1.1
   Compiling cursor-icon v1.1.0
   Compiling fastrand v2.1.0
   Compiling smol_str v0.2.2
   Compiling itoa v1.0.11
   Compiling notify v6.1.1
   Compiling regex v1.10.4
   Compiling base64 v0.21.7
   Compiling rustc-demangle v0.1.24
   Compiling nv-flip-sys v0.1.1
   Compiling env_filter v0.1.0
   Compiling env_logger v0.11.3
   Compiling thiserror-impl v1.0.61
   Compiling foreign-types-macros v0.2.3
   Compiling bytemuck_derive v1.7.0
   Compiling clap_derive v4.5.4
   Compiling wasm-bindgen-wasm-conventions v0.2.92
   Compiling tracing-attributes v0.1.27
   Compiling foreign-types v0.5.0
   Compiling metal v0.28.0
   Compiling core-graphics v0.23.2
   Compiling bytemuck v1.16.0
   Compiling naga v0.20.0
   Compiling font-types v0.5.5
   Compiling read-fonts v0.19.3
   Compiling image v0.25.1
   Compiling tracing v0.1.40
   Compiling objc2-app-kit v0.2.2
   Compiling clap v4.5.4
   Compiling wasm-bindgen-wasm-interpreter v0.2.92
   Compiling wasm-bindgen-threads-xform v0.2.92
   Compiling wasm-bindgen-multi-value-xform v0.2.92
   Compiling wasm-bindgen-externref-xform v0.2.92
   Compiling tempfile v3.10.1
   Compiling wasm-bindgen-cli-support v0.2.92
   Compiling notify-debouncer-mini v0.4.1
   Compiling vello_tests v0.0.0 (*rust/vello/vello_tests)
   Compiling devserver_lib v0.4.2
   Compiling pico-args v0.5.0
   Compiling nv-flip v0.1.2
   Compiling cargo-run-wasm v0.4.0
   Compiling run_wasm v0.0.0 (*rust/vello/examples/run_wasm)
   Compiling skrifa v0.19.3
   Compiling vello_shaders v0.2.0 (*rust/vello/vello_shaders)
warning: fields `module` and `module_info` are never read
  --> vello_shaders/src/compile/mod.rs:80:9
   |
78 | pub struct ShaderInfo {
   |            ---------- fields in this struct
79 |     pub source: String,
80 |     pub module: Module,
   |         ^^^^^^
81 |     pub module_info: ModuleInfo,
   |         ^^^^^^^^^^^
   |
   = note: `ShaderInfo` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
   = note: `#[warn(dead_code)]` on by default

warning: field `name` is never read
  --> vello_shaders/src/types.rs:34:9
   |
33 | pub struct BindingInfo {
   |            ----------- field in this struct
34 |     pub name: Option<String>,
   |         ^^^^
   |
   = note: `BindingInfo` has derived impls for the traits `Debug` and `Clone`, but these are intentionally ignored during dead code analysis

warning: fields `size_in_bytes` and `index` are never read
  --> vello_shaders/src/types.rs:41:9
   |
40 | pub struct WorkgroupBufferInfo {
   |            ------------------- fields in this struct
41 |     pub size_in_bytes: u32,
   |         ^^^^^^^^^^^^^
42 |     /// The order in which th...
43 |     pub index: u32,
   |         ^^^^^
   |
   = note: `WorkgroupBufferInfo` has derived impls for the traits `Debug` and `Clone`, but these are intentionally ignored during dead code analysis

warning: `vello_shaders` (build script) generated 3 warnings
   Compiling vello_encoding v0.2.0 (*rust/vello/vello_encoding)
warning: field `th1` is never read
  --> vello_shaders/src/cpu/euler.rs:36:9
   |
34 | pub struct EulerParams {
   |            ----------- field in this struct
35 |     pub th0: f32,
36 |     pub th1: f32,
   |         ^^^
   |
   = note: `EulerParams` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
   = note: `#[warn(dead_code)]` on by default

warning: `vello_shaders` (lib) generated 1 warning
   Compiling wgpu-profiler v0.17.0
   Compiling vello v0.2.0 (*rust/vello/vello)
   Compiling scenes v0.0.0 (*rust/vello/examples/scenes)
   Compiling simple v0.0.0 (*rust/vello/examples/simple)
   Compiling with_winit v0.0.0 (*rust/vello/examples/with_winit)
   Compiling headless v0.0.0 (*rust/vello/examples/headless)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 58.75s

It got stuck and nothing responded except the mouse. I had to restart it.

DJMcNab commented 2 months ago

Oh, I didn't realise that your machine isn't an M1 machine, as you indicated it was by posting that in this issue.

Would you mind creating a new issue for the behaviour you're seeing? As a starting point, could you please determine which GPU is seeing this crash.

sfjohnson commented 2 months ago

I'm still getting the full system hang on main branch, but a little different to above. I'm on 8GB M2, Ventura 13.6.7. The issue happens immediately when launching with_winit, without zooming.

let force_gpu_from = Some("fine_area"); doesn't hang, but let force_gpu_from = Some("coarse"); does, so I'm getting the hang even with flatten on CPU. macOS is reporting

Termination Reason:    Namespace WATCHDOG, Code 1 monitoring timed out for service
(1 monitored services unresponsive): checkin with service: WindowServer returned not alive with context:
unresponsive work processor(s): WindowServer main thread
DJMcNab commented 2 months ago

Can you please confirm which commit you're using @sfjohnson? We have had a memory leak issue which was solved today (#661), which I could see causing this kind of issue.

sfjohnson commented 2 months ago

It was the latest 59c0fa572d193fbd22fbb204a80d086646156128 with the fix applied.

DJMcNab commented 2 months ago

Can you determine whether this was a regression. If so, which commit was it introduced in?

We have several developers on M1 family chips, so your experience is surprising to me.

sfjohnson commented 2 months ago

Found it! It's 1daf2a4. If I apply compilation_options: PipelineCompilationOptions::default() back to main branch, it works.

DJMcNab commented 2 months ago

Hmm, that's concerning. Do you think you can extract the full MSL for the relevant shader with and without that setting?

Is it coarse which is failing, or path_tiling?

sfjohnson commented 2 months ago

Looks like it's the same MSL regardless of compilation_options. It's coarse which is failing: coarse.metal.txt

DJMcNab commented 2 months ago

Hmm, the same MSL being generated doesn't track with my expectations. The only thing that compilation_options does is pass a different argument to naga, to change what MSL is being generated.

sfjohnson commented 2 months ago

I double checked and same result. I'm not sure if I collected the MSL correctly though, I did:

  1. Change to default = ["wgsl", "full", "cpu", "msl"] in vello_shaders/Cargo.toml
  2. cargo build
  3. Copy the MSL string out of target/debug/build/vello_shaders-<some hex>/out/shaders.rs
  4. Write to a file, parsing out all the newlines
  5. Undo 1daf2a4
  6. cargo clean; cargo build
  7. Repeat 3 and 4

Is that right?

DJMcNab commented 2 months ago

Those aren't the shaders being generated by wgpu - those are shaders generated by vello_shaders for third-party users of our shaders.

You wouldn't need to change any features to get the shaders from wgpu, although unfortunately I don't know the best way. I think it might involve either adding debug prints inside wgpu, or using your system's GPU debugging tools.

sfjohnson commented 2 months ago

Ok I think this makes more sense now, I am logging from inside wgpu. The diff is (hangs when not present):

if (metal::all(local_id == metal::uint3(0u))) {
    for (int __i0 = 0; __i0 < 8; __i0++) {
        for (int __i1 = 0; __i1 < 256; __i1++) {
            metal::atomic_store_explicit(&sh_bitmaps.inner[__i0].inner[__i1], 0, metal::memory_order_relaxed);
        }
    }
    sh_part_count = {};
    sh_part_offsets = {};
    sh_drawobj_ix = {};
    sh_tile_stride = {};
    sh_tile_width = {};
    sh_tile_x0y0_ = {};
    sh_tile_count = {};
    sh_tile_base = {};
}
metal::threadgroup_barrier(metal::mem_flags::mem_threadgroup);

Full sources: coarse-ziwm-false.metal.txt coarse-ziwm-true.metal.txt

DJMcNab commented 2 months ago

I realise that this will be quite hard to do, but do you think you could isolate which of those is required?

The easiest one to validate would be the barrier, because you can just add a workgroupBarrier() at the start of the shader.

Thanks so much for being so patient with debugging this so far!

sfjohnson commented 2 months ago

It seems to hang unless everything is cleared, and I had to add some extra barriers. Here's what I have working, added to the start of fn main in coarse.wgsl:

    for (var i = 0u; i < N_SLICE; i += 1u) {
        for (var j = 0u; j < N_TILE; j += 1u) {
            atomicStore(&sh_bitmaps[i][j], 0u);
        }
    }

    for (var i = 0u; i < WG_SIZE; i += 1u) {
        workgroupBarrier();
        sh_part_count[i] = 0u;
        sh_part_offsets[i] = 0u;
        sh_drawobj_ix[i] = 0u;
        sh_tile_stride[i] = 0u;
        sh_tile_width[i] = 0u;
        sh_tile_x0y0[i] = 0u;
        sh_tile_count[i] = 0u;
        sh_tile_base[i] = 0u;
    }

    workgroupBarrier();

Note that this might not be completely optimal as I've never written WGSL before and I'm trying to minimise subjecting my computer to lots of hard reboots. Fortunately it seems this is all that is required; all other shaders work without zero initialisation.

DJMcNab commented 2 months ago

That clearing routine is UB. The pattern you actually want in this case is:

buffer[local_id.x] = 0;

for each buffer, and not in a loop

I wonder if the buffers start in a poison state, so metal now decides that it can just do ub?

sfjohnson commented 2 months ago

Oh I see, like this right? (removed one barrier and it still works):

    for (var i = 0u; i < N_SLICE; i += 1u) {
        for (var j = 0u; j < N_TILE; j += 1u) {
            atomicStore(&sh_bitmaps[i][j], 0u);
        }
    }

    sh_part_count[local_id.x] = 0u;
    sh_part_offsets[local_id.x] = 0u;
    sh_drawobj_ix[local_id.x] = 0u;
    sh_tile_stride[local_id.x] = 0u;
    sh_tile_width[local_id.x] = 0u;
    sh_tile_x0y0[local_id.x] = 0u;
    sh_tile_count[local_id.x] = 0u;
    sh_tile_base[local_id.x] = 0u;

    workgroupBarrier();
raphlinus commented 2 months ago

I'm desk-checking the code now to see if there's any uninitialized read. Would it be possible to isolate which of these initializations is responsible?

Also, the pattern of initializing sh_bitmaps is way less efficient than it could be (though not undefined behavior, as the store is atomic). A better pattern is the initialization on lines 205-207.

sfjohnson commented 2 months ago

Hmm, unfortunately while trying to isolate each initialisation things stopped being predictable. Now the code I posted above sometimes causes a hang. It looks like the bug might not actually be isolated to coarse.wgsl. It's quite troublesome to debug due to all the hard reboots, and I'm concerned about data corruption. Maybe there's a way to test in a VM with GPU access?

raphlinus commented 2 months ago

I'm also quite willing to dig into this myself, but it's unclear how to repro. Just so I understand, it's failing just running the default scene, nothing special? That certainly works on my machine (M1 Pro, 14.2.1).

It's certainly possible that there's an uninitialized memory read elsewhere in the pipeline, that was getting masked by the zeroing.

sfjohnson commented 2 months ago

I just double checked and yeah it's super easy to repro for me just by cloning the repo and running cargo run -p with_winit. I get an instant hang with nothing rendered in the window. I'm on the latest stable Rust. Not sure what is different about my computer but I might look into this and see how it runs Metal.

94bryanr commented 1 month ago

I am also running into this problem on an M2 mac. In my case:

This happens consistently on version 0.2.1. Here is a video of the behavior: https://www.youtube.com/watch?v=y5-IIJHvLgY. I am not changing the scene at all during that video, just resizing the screen.

The behavior from the video is happening here: https://www.cocube.com/console. Happy to work with you to fix this (its not great UX to crash someones computer from your website) and I'd like to stick with vello. I've looked over the code for the shaders and have a decent enough high-level understanding to try making some changes but I could still use some guidance.

waywardmonkeys commented 1 month ago

@94bryanr Just for extra info, how much memory does your M2 Mac have?

raphlinus commented 1 month ago

I have a hypothesis: this might be uninitialized memory read of workgroup shared memory. That would be consistent with zeroing the memory mitigating the problem, and would also explain why it manifests after long running time - it may be a low probability that a particular value causes an infinite loop.

It's somewhat frustrating, because decent tooling could help catch it, but we don't have that. A couple things can be done. One is to carefully desk check the shaders for UMR (I looked over coarse, didn't find anything, but I could have missed something, and it might be a different shader). Another is to deliberately inject garbage initial values (3735928559u etc) and see if that changes behavior.

Another pathway is to get a repro case I can observe. The application of @94bryanr sounds promising if we can get that to happen on my machine.

It would be really good to get this tracked down.

sfjohnson commented 1 month ago

I recently upgraded from macOS 12 to 14 and now the issue is gone, even when zooming in close multiple times on with_winit. Interestingly, after the testing I did a few weeks ago my system would randomly hang every few days without running Vello, with the same WindowServer returned not alive error. I'm thinking my Metal drivers were updated with a fix.

DJMcNab commented 1 month ago

Thanks for that report. I'm glad to hear it.

This is the third report we've received of this kind of hang happening on macOS 12, and the second of it being fixed after an update of macOS. I don't think we can meaningfully take any action here. @XdaTk, please update your macOS version, but I'm going to close this on the assumption that would fix this.

We can always re-open if that hypothesis is wrong.

94bryanr commented 1 month ago

My M2 Mac Pro is the 32GB version and it is running MacOS Ventura 13.6.4. I'll go ahead and update my system to MacOS Sonoma and see if anything changes. If not I'll try digging into the code more to see if I can isolate the issue.

94bryanr commented 1 month ago

The problem persists even after updating to MacOS Sonoma. Just to recap I am experiencing the issue on an M2 Mac 32GB while using vello 0.2.1. The problem happens on MacOS Ventura and on a fully updated MacOS Sonoma. The issue does not happen on Windows. For most people experiencing this it sounds like the issue is related to zooming in and out but for me the issue only happens when the resolution of the rendering context (in my case an HTML canvas) is increased to nearly 4k.

I haven't been able to test the main branch yet since it looks like most of the wgpu types are now re-exported under vello::wgpu, which required more refactoring than I was able to get done at the time (without sidetracking too much please reconsider re-exporting those types as it makes vello take over the entire wgpu pipeline. I need to change all of my wgpu::GPU and wgpu::Device etc to vello::wgpu::*.). I'm going to take another look at this over this though.

@raphlinus You should be able to at least repro in the browser at https://www.cocube.com/console if you stretch the window to 4k and try scrolling up and down, but I'm not sure how valuable that will be.

And thanks for the amazing work on vello on this so far - very excited about the future of the project!

Update: It seems like the display freezing and requiring a reboot is no longer happening on the updated MacOS version, but I am still seeing the visual artifact of nothing rendering below a certain line, with the line rising the more the resolution is expanded.

dominikh commented 1 month ago

but I am still seeing the visual artifact of nothing rendering below a certain line, with the line rising the more the resolution is expanded.

That problem reproduces for me on Linux, but my assumption is that it's another instance of #366.

DJMcNab commented 1 month ago

Yes, I suspect that is probably one of the drive-by fixes I have done in #606, give me half an hour to make a small PR fixing it. That is, segments wasn't properly write-protected. See #673 I'm slightly surprised that your scene is large enough to run past our bump buffer limits, based on the videos you've sent.

I haven't been able to test the main branch yet since it looks like most of the wgpu types are now re-exported under vello::wgpu, which required more refactoring than I was able to get done at the time (without sidetracking too much please reconsider re-exporting those types as it makes vello take over the entire wgpu pipeline. I need to change all of my wgpu::GPU and wgpu::Device etc to vello::wgpu::*.). I'm going to take another look at this over this though.

I don't understand what you're saying here, sorry. Our wgpu re-export shouldn't have any impact on whether you can add your own dependency on wgpu.

If the hangs are not happening, then that vindicates the decision not to close this issue.