I only have one GPU(GTX1060)

tianzhi0549 / FCOS

FCOS: Fully Convolutional One-Stage Object Detection (ICCV'19)

https://arxiv.org/abs/1904.01355

Other

3.28k stars 630 forks source link

I only have one GPU(GTX1060) #124

Open hello-piger opened 5 years ago

hello-piger commented 5 years ago

I only have one GPU(GTX1060), Can I do Distributed training with following script? python -m torch.distributed.launch \ --nproc_per_node=1 \ --master_port=$((RANDOM + 10000)) \ tools/train_net.py \ --skip-test \ --config-file configs/fcos/fcos_R_50_FPN_1x.yaml \ DATALOADER.NUM_WORKERS 2 \ OUTPUT_DIR training_dir/fcos_R_50_FPN_1x

tianzhi0549 commented 5 years ago

@hello-piger yes, you can use the command line, but you cannot benefit from this distributed training.

hello-piger commented 5 years ago

OK ，Thank you for your reply ！

--------------原始邮件-------------- 发件人："Tian Zhi "notifications@github.com; 发送时间：2019年9月2日(星期一) 中午11:40 收件人："tianzhi0549/FCOS" FCOS@noreply.github.com; 抄送："hello-piger "gmj@mpig.com.cn;"Mention "mention@noreply.github.com; 主题：Re: [tianzhi0549/FCOS] I only have one GPU(GTX1060) (#124)

@hello-piger yes, you can use the command line, but you cannot benefit from this distributed training.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

LIUhansen commented 4 years ago

what should I do, if I want random initialization instead of using pretrain network, ?

sherwincn commented 4 years ago

Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?

whalesea commented 3 years ago

Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?

Did you solve it later? I used one GPU to train, and the loss was about 1.0. Based on resnet-50, I have changed the super parameters according to the author, and the final result was only 35.8

Kyle-fang commented 2 years ago

Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?

Did you solve it later? I used one GPU to train, and the loss was about 1.0. Based on resnet-50, I have changed the super parameters according to the author, and the final result was only 35.8

Hi! What dataset are you training on? COCO or? Also what model is your GPU?