training COCO on one GPU?

ucbdrive / few-shot-object-detection

Implementations of few-shot object detection benchmarks

Apache License 2.0

1.08k stars 225 forks source link

training COCO on one GPU? #124

Closed salehnia closed 2 years ago

salehnia commented 3 years ago

Hello, Can you share me your parameters for 1 GPU with 12 GB memory in training coco dataset? I used of this Article(https://arxiv.org/abs/1706.02677) and set it as this: gpu 8 -> 1 lr 0.02 -> 0.0025 max_iter 90000 ->720000

but i have this error now: FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged

i had to choose lr very low as 0.00000125 and it will make different result. bAP is 0.0025 :(

Wei-i commented 2 years ago

I met the same problem, even if I have 8 GPU and the learning-rate is set as 0.02. I also have FloatingPointError, I am so confused... Do you fix this bug?

For training base classes using the default code in readme.md

python3 -m tools.train_net --num-gpus 8 \
        --config-file configs/PascalVOC-detection/split1/faster_rcnn_R_101_base1.yaml

Wei-i commented 2 years ago

From this issue, maybe I get it , my latest version of detectron2 is 0.5, while the default is 0.2.

alphacyp commented 2 years ago

From this issue, maybe I get it , my latest version of detectron2 is 0.5, while the default is 0.2.

Hello, sir! I have the same problem. Can you run normally after you change the version of detectron2?

Wei-i commented 2 years ago

From this issue, maybe I get it , my latest version of detectron2 is 0.5, while the default is 0.2.

Hello, sir! I have the same problem. Can you run normally after you change the version of detectron2?

yes you can try

salehnia commented 2 years ago

Thank you @Wei-i , i changed detectron2 version to 0.2.1 and parameters as i said in first message

alphacyp commented 2 years ago

From this issue, maybe I get it , my latest version of detectron2 is 0.5, while the default is 0.2.

Hello, sir! I have the same problem. Can you run normally after you change the version of detectron2?

yes you can try

Thank you very much for your reply. This problem has been solved for the time being.Is the version of detectron2 making the calculated AP value very low? Or,when the detectron2 version is 0.5, the trained model is normal, but the detectron2 version leads to errors in the AP test?Is that right? Moreover,when I was performing Stage 2: Few-Shot Fine-Tuning,I can't find the fast rcnn R 101 FPN ft novel1_ 1shot.yaml. do you have any solutions?

Wei-i commented 2 years ago

fast_ rcnn_ R_ 101_ FPN_ ft_ novel1_ 1shot.yaml authors don't provide, you can find it in other issues.

thomasehuang commented 2 years ago

Thanks for the answers.

@alphacyp The exact reason is something we have to investigate, but we are planning on upgrading to the newest version of detectron2 (may take some time). The config file can be found in this issue. See my answer there.