Open hello-piger opened 5 years ago
@hello-piger yes, you can use the command line, but you cannot benefit from this distributed training.
OK ,Thank you for your reply !
@hello-piger yes, you can use the command line, but you cannot benefit from this distributed training.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
what should I do, if I want random initialization instead of using pretrain network, ?
Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?
Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?
Did you solve it later? I used one GPU to train, and the loss was about 1.0. Based on resnet-50, I have changed the super parameters according to the author, and the final result was only 35.8
Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?
Did you solve it later? I used one GPU to train, and the loss was about 1.0. Based on resnet-50, I have changed the super parameters according to the author, and the final result was only 35.8
Hi! What dataset are you training on? COCO or? Also what model is your GPU?
I only have one GPU(GTX1060), Can I do Distributed training with following script? python -m torch.distributed.launch \ --nproc_per_node=1 \ --master_port=$((RANDOM + 10000)) \ tools/train_net.py \ --skip-test \ --config-file configs/fcos/fcos_R_50_FPN_1x.yaml \ DATALOADER.NUM_WORKERS 2 \ OUTPUT_DIR training_dir/fcos_R_50_FPN_1x