ifzhang / ByteTrack

[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box
MIT License
4.69k stars 891 forks source link

"CUDA out of memory" when training with 300 classes dataset. #197

Open igorvishnevskiy opened 2 years ago

igorvishnevskiy commented 2 years ago

Let me mention that training on dataset with 6K inputs and 1 class, works great. However the training with 300 classes and 6000 images large dateset causes the following error:

------------CPU Mode for This Batch-------------
2022-05-25 13:52:53 | INFO     | yolox.models.yolo_head:335 - OOM RuntimeError is raised due to the huge memory cost during label assignment. 
CPU mode is applied in this batch. If you want to avoid this issue, try to reduce the batch size or image size.
OOM RuntimeError is raised due to the huge memory cost during label assignment. 
CPU mode is applied in this batch. If you want to avoid this issue, try to reduce the batch size or image size.

Training continues for some time, then quits completely with "CUDA out of memory":

RuntimeError: CUDA out of memory. Tried to allocate 22.00 MiB (GPU 0; 7.79 GiB total capacity; 6.20 GiB already allocated; 21.44 MiB free; 6.26 GiB reserved in total by PyTorch)

Trying to fix it. Help from everyone else is welcome. Please drop your solutions/thoughts here. As soon as I find a solution, I will share it here too. Thank you.

I'm running on 2 GPUs. GTX 1070 and RTX 3070. Should be plenty. Platform needs more optimization.

P.S. Lowering batch doesn't help. I set batch to "-b 2" and devices also to "-d 2". 1 batch per GPU. Can't get lower than that.

Image size is set to: self.input_size = (256, 512) self.test_size = (256, 512) Also very low res.

igorvishnevskiy commented 2 years ago

Just tried to cut inputs down to 10 images, but left 300 classes. Still issue reproduces. Same low res, same 1 batch per GPU. Issue is definitely caused by the high number of classes.