Resuming training from checkpoint

zifuwan / Sigma

[WACV 2025] Python implementation of Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

https://zifuwan.github.io/Sigma/

MIT License

190 stars 19 forks source link

Resuming training from checkpoint #10

Closed JayParanjape closed 7 months ago

JayParanjape commented 7 months ago

Hi, Thanks for this amazing repo and work! Could you please guide me as to how to resume training from a checkpoint.

zifuwan commented 7 months ago

Hi, please try to add -c "checkpoint_pth" to the training command, e.g., NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES="0,1,2,3" python -m torch.distributed.launch --nproc_per_node=4 --master_port 29502 train.py -p 29502 -d 0,1,2,3 -n "dataset_name" -c "checkpoint_pth" . Thanks.

JayParanjape commented 7 months ago

Thanks, it works! Closing the issue