questions about multi-gpu training.

jiaxi-wu / MPSR

Multi-scale Positive Sample Refinement for Few-shot Object Detection, ECCV2020

MIT License

135 stars 18 forks source link

questions about multi-gpu training. #4

Open hzhupku opened 4 years ago

hzhupku commented 4 years ago

hello, thank you for your helpful code. In your code, you use two gpus, and img_per_batch is 4, max_iter is 36000. When I use 8 gpus, img_per_batch becomes 16 , the max_iter is still 36000, which takes a lot of time to train. I believe it is because IterationBasedBatchSampler is used. Should I modify the max_iter in the config file for 8gpus setting? Will the decrease of iterations cause the decline of performance?

jiaxi-wu commented 4 years ago

You need to change the cfg files, e.g.:

SOLVER:
  BASE_LR: 0.02
  WEIGHT_DECAY: 0.0001
  STEPS: (6000, 8000)
  MAX_ITER: 9000
  IMS_PER_BATCH: 16
TEST:
  IMS_PER_BATCH: 8

This should have no influence in theory. Please contact us if you find some interesting results.

hzhupku commented 4 years ago

Thank you for your reply~

hzhupku commented 4 years ago

The parameters you provide above will cause nan loss in the training. I guess it still requires some experiments to find the right parameter.

jiaxi-wu commented 4 years ago

Sorry I don't have an 8-gpu machine, so I can't figure out these hyperparameters. I suggest trying a lower learning rate (e.g. BASE_LR=0.01 or 0.005) to avoid Nan loss here.