junjiehe96 / FastInst

[CVPR2023] FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
MIT License
175 stars 16 forks source link

How to resume training from last checkpoint #33

Open sparshgarg23 opened 5 months ago

sparshgarg23 commented 5 months ago

Hi,I came across your work on instance segmentation and I am currently trying to reproduce the results.I was previously able to train the model for 90,000 iterations but when I tried resuming the training from the last checkpoint,I ended up getting some errors related to not properly loading the configuration file.

as i am new to detectron2,could you provide pointers on how to resume training from existing checkpoint.Does the resume option expect a cfg file as an argument or does it expect a model weights? thanks

junjiehe96 commented 5 months ago

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_net.py --num-gpus 4 \ --config-file /path/to/your_config.yaml \ --resume MODEL.WEIGHTS /path/to/existing_checkpoint.pth

sparshgarg23 commented 5 months ago

hmmm tried that but instead of resuming from iteration 70,000K it restrarted training from iteration 0.