rerun-io / rerun

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
https://rerun.io/
Apache License 2.0
6.46k stars 314 forks source link

Freeze + Crash while logging 3d-data #4211

Closed playreplayoliver closed 5 days ago

playreplayoliver commented 11 months ago

Describe the bug Freeze and crash.

To Reproduce It happens quite a lot, but kind of hard to reproduce intentionally.

The setup is:

Backtrace

thread 'ThreadId(1)' panicked at 'Error in Surface::get_current_texture_view: Validation Error

Caused by:
    Parent device is lost

wgpu-0.17.0/src/backend/direct.rs:815
stack backtrace:
   6: core::panicking::panic_fmt
             at core/src/panicking.rs:67:14
   7: wgpu::backend::direct::Context::handle_error_fatal

Desktop: Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy

Rerun version rerun_py 0.10.0 [rustc 1.72.1 (d5c2e9c34 2023-09-13), LLVM 16.0.5] x86_64-unknown-linux-gnu release-0.10.0 feea69f, built 2023-10-30T16:58:15Z

Additional context The crash is preceded by a +5s freeze, which reminds me a bit of a previous deadlock-bug in the rerun-viewer.

Wumpf commented 11 months ago

Device lost usually indicates an issue in the driver. What GPU & driver are you running with? Did you already get slow-down prior to the issue? The way you describe it (5s freeze) this is likely a TDR (Timeout Detection & Recovery) which happens if something causes too much GPU load. To clarify, is your whole system freezing?

In any case we should figure out how to better handle this - reattempt device creation, dump more information, give better error etc.

playreplayoliver commented 11 months ago

To clarify, is your whole system freezing? No, only the rerun-viewer.

32,0 GiB RAM 12th Gen Intel® Core™ i7-1255U × 12 Mesa Intel® Graphics (ADL GT2)

hwinfo --gfxcard
25: PCI 02.0: 0300 VGA compatible controller (VGA)              
  [Created at pci.386]
  Unique ID: _Znp.isTakwJ_oa8
  SysFS ID: /devices/pci0000:00/0000:00:02.0
  SysFS BusID: 0000:00:02.0
  Hardware Class: graphics card
  Model: "Intel VGA compatible controller"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x4628 
  SubVendor: pci 0x17aa "Lenovo"
  SubDevice: pci 0x50ac 
  Revision: 0x0c
  Driver: "i915"
  Driver Modules: "i915"
  Memory Range: 0x601c000000-0x601cffffff (rw,non-prefetchable)
  Memory Range: 0x4000000000-0x400fffffff (ro,non-prefetchable)
  I/O Ports: 0x3000-0x303f (rw)
  Memory Range: 0x000c0000-0x000dffff (rw,non-prefetchable,disabled)
  IRQ: 175 (115660166 events)
  Module Alias: "pci:v00008086d00004628sv000017AAsd000050ACbc03sc00i00"
  Driver Info #0:
    Driver Status: i915 is active
    Driver Activation Cmd: "modprobe i915"
  Config Status: cfg=new, avail=yes, need=no, active=unknown
Wumpf commented 6 days ago

@playreplayoliver are you still hitting this? There has been relevant fixes in wgpu by now but tbh I lost track of all the details

playreplayoliver commented 5 days ago

@playreplayoliver are you still hitting this? There has been relevant fixes in wgpu by now but tbh I lost track of all the details

No not really. I think we can close this. If I encounter this again i will create another issue.