jshilong / DDQ

Dense Distinct Query for End-to-End Object Detection (CVPR2023)
Apache License 2.0
244 stars 6 forks source link

Single GPU run #7

Closed idonahum1 closed 1 year ago

idonahum1 commented 1 year ago

Hi, Thanks for the great work. I would like to ask if you tried to train a DETR DDQ model on a single GPU, since I get CUDA out of memory error when using batch_size = 2 , which seems a little bit weird, since in your implementation you used 8 GPU's X 2 samples per GPU, and my setting is the same for single GPU. am I doing something wrong? my GPU is Tesla V100 with 16GB

Thanks.

jshilong commented 1 year ago

Hi, Thanks for the great work. I would like to ask if you tried to train a DETR DDQ model on a single GPU, since I get CUDA out of memory error when using batch_size = 2 , which seems a little bit weird, since in your implementation you used 8 GPU's X 2 samples per GPU, and my setting is the same for single GPU. am I doing something wrong? my GPU is Tesla V100 with 16GB

Thanks.

This is indeed a bit strange, because the log should be about 11G, but you can try to reduce the number of queries, for example, from 900 to 500, and change the aux query ratio to 1, https://github.com/jshilong/DDQ/blob/a166d18658b6b5b57621c00d6aa04e52a80e65bd/projects/models/ddq_detr.py#L220 which will not cause significant performance loss

jshilong commented 1 year ago

Feel free to reopen the issue if there is any question

yxxxxx38324 commented 1 year ago

I have a similar problem, I use ddq-detr-5scale_r50_8xb2-12e_coco.py, when the batch size is only 1, the memory occupation on 3090Ti is about 20GB, is there any way to reduce the memory occupation during training? Thanks a lot.

jshilong commented 1 year ago

I have a similar problem, I use ddq-detr-5scale_r50_8xb2-12e_coco.py, when the batch size is only 1, the memory occupation on 3090Ti is about 20GB, is there any way to reduce the memory occupation during training? Thanks a lot.

You can try to reduce the number of queries, for example, from 900 to 500, and change the aux query ratio to 1, https://github.com/jshilong/DDQ/blob/a166d18658b6b5b57621c00d6aa04e52a80e65bd/projects/models/ddq_detr.py#L220 which will not cause significant performance loss Besides it, the 4-scale config is strong enough in most cases

jshilong commented 1 year ago

I have a similar problem, I use ddq-detr-5scale_r50_8xb2-12e_coco.py, when the batch size is only 1, the memory occupation on 3090Ti is about 20GB, is there any way to reduce the memory occupation during training? Thanks a lot.

You can try to reduce the number of queries, for example, from 900 to 500, and change the aux query ratio to 1, https://github.com/jshilong/DDQ/blob/a166d18658b6b5b57621c00d6aa04e52a80e65bd/projects/models/ddq_detr.py#L220 which will not cause significant performance loss Besides this, the 4-scale config is strong enough in most cases

yxxxxx38324 commented 1 year ago

I have a similar problem, I use ddq-detr-5scale_r50_8xb2-12e_coco.py, when the batch size is only 1, the memory occupation on 3090Ti is about 20GB, is there any way to reduce the memory occupation during training? Thanks a lot.

You can try to reduce the number of queries, for example, from 900 to 500, and change the aux query ratio to 1,

https://github.com/jshilong/DDQ/blob/a166d18658b6b5b57621c00d6aa04e52a80e65bd/projects/models/ddq_detr.py#L220

which will not cause significant performance loss Besides this, the 4-scale config is strong enough in most cases

Thanks for the quick response. If I lower the number of queries, does that mean I can't use the weights you provide for finetune?