qfgaohao / pytorch-ssd

MobileNetV1, MobileNetV2, VGG based SSD/SSD-lite implementation in Pytorch 1.0 / Pytorch 0.4. Out-of-box support for retraining on Open Images dataset. ONNX and Caffe2 support. Experiment Ideas like CoordConv.
https://medium.com/@smallfishbigsea/understand-ssd-and-implement-your-own-caa3232cd6ad
MIT License
1.39k stars 530 forks source link

Training not starting... #144

Closed christnp closed 3 years ago

christnp commented 3 years ago

Hello,

I'm trying to follow the tutorial you put together, but it seems to never even start training. It gets to this logging message, then never makes any progress (no output or anything).

2020-12-05 00:55:45,914 - root - INFO - Start training from epoch 0.

I've traced it down to this line: https://github.com/qfgaohao/pytorch-ssd/blob/master/train_ssd.py#L116

If it helps, this is the terminal call fo train_ssd.py that I'm making:

# finetune the SSD model for the new data
python pytorch-ssd/train_ssd.py 
   --dataset_type open_images 
   --datasets data/trainval/open_images 
   --net mb1-ssd 
   --pretrained_ssd pytorch-ssd/models/mobilenet-v1-ssd-mp-0_675.pth 
   --scheduler cosine 
   --lr 0.01 
   --t_max 100 
   --validation_epochs 5 
   --num_epochs 100 
   --base_net_lr 0.001  
   --batch_size 5

note 1: the open_images are downloaded to the correct directory as is the pretrained model. note 2: this is ran from a jupyter notebook

christnp commented 3 years ago

Disregard. It seems to be an issue with Jupyter not printing to the console beyond the line above. When I run the command from a terminal (on the same GCP instance running the notebook) I can see the output.