Deleting instance causes segmentation fault #969

Closed daskandalis closed 1 year ago

daskandalis commented 1 year ago

Bug description

Deleting instance causes segmentation fault on next action.

Expected behaviour

Added extra instance with Ctrl+I. Right clicked to hide all landmarks in the instance, which deleted the instance. Expected to continue digitising.

Actual behaviour

Slightly variable but may include:

Nothing appearing when adding instance to next frame.

Subsequent clicking causes segmentation fault and core dump.

Your personal set up

SLEAP: 1.2.8 TensorFlow: 2.8.3 Numpy: 1.21.5 Python: 3.7.13 OS: Linux-5.15.0-48-generic-x86_64-with-debian-bookworm-sid

Logs ``` Saving config: /home/usr/.sleap/1.2.8/preferences.yaml 2022-09-26 09:31:27.657784: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 09:31:27.686165: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 09:31:27.686360: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero Software versions: SLEAP: 1.2.8 TensorFlow: 2.8.3 Numpy: 1.21.5 Python: 3.7.13 OS: Linux-5.15.0-48-generic-x86_64-with-debian-bookworm-sid Happy SLEAPing! :) Segmentation fault (core dumped) ```


How to reproduce

  1. Create new project
  2. Add video
  3. Create skeleton (e.g. two landmarks)
  4. Add instance
  5. Add instance
  6. Right click landmarks until instance disappears
  7. Next action will usually cause a segmentation fault (e.g. adding instance, generating suggested frames)
roomrys commented 1 year ago

First-Pass Analysis

The segmentation/program crashing bug seems to be caused by the hidden user-labeled instances. This issue is also mentioned in #971.

Segmentation Fault

This seems to be caused because we do not add user-labeled instances to the scene in cases where all nodes are hidden. When I change the conditional in https://github.com/talmolab/sleap/blob/develop/sleap/gui/widgets/video.py#L449-L466 such that user-labeled instances are always added to the scene:

if instance.instance.n_visible_points > 0 or not isinstance(instance.instance, PredictedInstance):

            # connect signal so we can adjust QtNodeLabel positions after zoom

then the program no longer crashes.

To replicate issue. However, my terminal gives no error message and just exits out of the program:

(sleap_v128) λ sleap-label
Saving config: C:\Users\Liezl/.sleap/1.2.8/preferences.yaml
Restoring GUI state...

Software versions:
SLEAP: 1.2.8
TensorFlow: 2.8.3
Numpy: 1.21.5
Python: 3.7.13
OS: Windows-10-10.0.19041-SP0

Happy SLEAPing! :)

Steps to replicate (same as described by @daskandalis)

  1. New Project
  2. Add video
  3. Create 2 node skeleton, no edges
  4. Add 2 instances on first frame
  5. Right click all nodes on a single instance until instance disappears
  6. Do action (either right click to add a new instance [default, random, average], or generate suggestions [sample] - either lead to issue)

Hidden User-Labeled Instances

User-labeled instances with all nodes set as invisible should still be visible. We currently set instances with all NaNs to be hidden from view. This was originally implemented to hide predicted instances which had all low-scoring nodes (these low-scoring nodes are also denoted by NaNs).


We can add an extra conditional that ensures user-labeled instances are always added while PredictedInstances are still subject to a minimum visible node limit.

Relevant Code


daskandalis commented 1 year ago

I replicated this issue in Windows 11 Anaconda Prompt (anaconda3) with the same steps as above:

(sleap) C:\Users\daska\Desktop\PROJECTS\sleap\sleap>sleap-label
Saving config: C:\Users\daska/.sleap/1.2.8/preferences.yaml
2022-09-26 20:23:35.137862: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2022-09-26 20:23:35.138357: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-09-26 20:23:35.141271: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: TOKAMAK-IX
2022-09-26 20:23:35.141412: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: TOKAMAK-IX

Software versions:
SLEAP: 1.2.8
TensorFlow: 2.8.3
Numpy: 1.21.5
Python: 3.7.13
OS: Windows-10-10.0.22000-SP0

Happy SLEAPing! :)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\daska\anaconda3\envs\sleap\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\daska\anaconda3\envs\sleap\lib\multiprocessing\spawn.py", line 113, in _main
    preparation_data = reduction.pickle.load(from_parent)
EOFError: Ran out of input