How to resume training from last checkpoint

sparshgarg23 commented 5 months ago

Hi,I came across your work on instance segmentation and I am currently trying to reproduce the results.I was previously able to train the model for 90,000 iterations but when I tried resuming the training from the last checkpoint,I ended up getting some errors related to not properly loading the configuration file.

as i am new to detectron2,could you provide pointers on how to resume training from existing checkpoint.Does the resume option expect a cfg file as an argument or does it expect a model weights? thanks

junjiehe96 commented 5 months ago

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_net.py --num-gpus 4 \ --config-file /path/to/your_config.yaml \ --resume MODEL.WEIGHTS /path/to/existing_checkpoint.pth

sparshgarg23 commented 5 months ago

hmmm tried that but instead of resuming from iteration 70,000K it restrarted training from iteration 0.

junjiehe96 / FastInst

How to resume training from last checkpoint #33