hkchengrex / Tracking-Anything-with-DEVA

[ICCV 2023] Tracking Anything with Decoupled Video Segmentation
https://hkchengrex.com/Tracking-Anything-with-DEVA/
Other
1.27k stars 129 forks source link

Error in the middle of a video. #67

Closed Mountchicken closed 8 months ago

Mountchicken commented 8 months ago

Hi @hkchengrex Thanks for the great work. I am trying to track dense objects in an image, e.g. more than 100 objects per image. I can successfully run the code for 50 frames. However, at the 51 frames, an error occurs:

Traceback (most recent call last):
  File "Tracking-Anything-with-DEVA/demo/demo_with_trex2.py", line 116, in <module>
    process_frame(deva,
  File "/home/jiangqing/miniconda3/envs/deva/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "Tracking-Anything-with-DEVA/deva/ext/with_text_processor.py", line 91, in process_frame_with_text
    prob = deva.incorporate_detection(this_image, mask,
  File "Tracking-Anything-with-DEVA/deva/inference/inference_core.py", line 193, in incorporate_detection
    self._add_memory(image, ms_features, self.last_mask, key, shrinkage, selection)
  File "Tracking-Anything-with-DEVA/deva/inference/inference_core.py", line 73, in _add_memory
    value, sensory = self.network.encode_mask(image,
  File "Tracking-Anything-with-DEVA/deva/model/network.py", line 54, in encode_mask
    g16, h16 = self.mask_encoder(image,
  File "/home/jiangqing/miniconda3/envs/deva/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "Tracking-Anything-with-DEVA/deva/model/big_modules.py", line 108, in forward
    g_chunk = self.bn1(g_chunk)  # 1/2, 64
  File "/home/jiangqing/miniconda3/envs/deva/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jiangqing/miniconda3/envs/deva/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 168, in forward
    return F.batch_norm(
  File "/home/jiangqing/miniconda3/envs/deva/lib/python3.10/site-packages/torch/nn/functional.py", line 2438, in batch_norm
    return torch.batch_norm(
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
Tracking-Anything-with-DEVA/deva/inference/image_feature_store.py:48: UserWarning: Leaking dict_keys([51, 50, 52]) in the image feature store

I am not familiar with the function image_feature_store.py. Can you share some insight about this bug? BTW, here are some visualization results from the model's out/. It just worked for the first 50 frames😂 00000019 00000030

hkchengrex commented 8 months ago

This error does not relate to image_feature_store but rather that some of the inputs are non-contiguous. I tested the automatic SAM demo on this SORA video and it worked fine past 100+ frames (SAM failed to detect a lot of them but no errors were thrown). There might be some problem with your demo_with_trex2 implementation. Notably, after 50 frames, we start to "remove" objects from the scene which might lead to non-contiguous output on your end if you are referring to the object indices in any way. The easiest way might just be to slap .contiguous() on suspicious tensors.

hkchengrex commented 8 months ago

Feel free to re-open if there are follow-up questions.