CUDA out of memory when the sequence last too long

timmeinhardt / trackformer

Implementation of "TrackFormer: Multi-Object Tracking with Transformers”. [Conference on Computer Vision and Pattern Recognition (CVPR), 2022]

https://arxiv.org/abs/2101.02702

Apache License 2.0

487 stars 113 forks source link

CUDA out of memory when the sequence last too long #101

Open czhaneva opened 1 year ago

czhaneva commented 1 year ago

Instructions To Reproduce the 🐛 Bug:

what changes you made (git diff) or what code you wrote
```
None
```
what exact command you run: CUDA_VISIBLE_DEVICES=0 python src/track.py with reid dataset_name=DEMO data_root_dir=${IMG_PATH} output_dir=${OUTPUT_PATH} write_images=True frame_range.start=0 frame_range.end=1.0

what you observed (including full logs):

The cost memory of GPU increases as the program runs until an error "CUDA out of memory" is reported.

please simplify the steps as much as possible so they do not require additional resources to run, such as a private dataset. My video sequence exceeds 10000 frames, and each frame contains about 10 people. I think the key to the problem is the increase in GPU memory cost.
Expected behavior:

The cost memory of GPU should be stables.

Environment:

Provide your environment information using the following command:

My enivorment is same as the INSTALL.md except the pytorch=1.5.1 and torchvision=0.6.1

HenryZhou19 commented 11 months ago

I encountered exactly the same bug.

timmeinhardt commented 11 months ago

This might indeed be a bug. We never tested the codebase for sequences with that many frames. During inference all previous tracks are kept in the memory. For 10000 or infinite number of frames this will accumulate. One could try to move tracks that are already past the re-identification window to the CPU memory.

HenryZhou19 commented 11 months ago

Hi Tim, thank you very much for your time and attention. Here is what happened to me: When I tried to run the pre-training as TRAIN.md says:" python src/train.py with \ crowdhuman \ deformable \ multi_frame \ tracking \ output_dir=models/crowdhuman_deformable_multi_frame \ ", the cost of GPU's memory kept increasing slowly until CUDA out of memory, and the training just failed.

timmeinhardt commented 11 months ago

@HenryZhou19 This is not the same problem as mentioned in the first message of this issue. The original problem was a CUDA out-of-memory issue during inference not training.

HenryZhou19 commented 11 months ago

Sorry for that. But I hope my problem could help locating the bug.

chamathabeysinghe commented 8 months ago

Hi @czhaneva Did you resolve this issue? I am facing the same problem.

imzhangyd commented 5 months ago

Hi Tim, thank you very much for your time and attention. Here is what happened to me: When I tried to run the pre-training as TRAIN.md says:" python src/train.py with crowdhuman deformable multi_frame tracking output_dir=models/crowdhuman_deformable_multi_frame ", the cost of GPU's memory kept increasing slowly until CUDA out of memory, and the training just failed.

Hi, I also met this problem, have you solved it? Could you please help me with it? @HenryZhou19 @timmeinhardt

timmeinhardt / trackformer

CUDA out of memory when the sequence last too long #101

Instructions To Reproduce the 🐛 Bug:

Expected behavior:

Environment: