qinenergy / corda

[ICCV 2021] Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation
71 stars 12 forks source link

How to continue train? #3

Closed ygjwd12345 closed 3 years ago

ygjwd12345 commented 3 years ago

when I use script llike

CUDA_VISIBLE_DEVICES=0 python3 -u trainUDA_gta.py --config ./configs/configUDA_gta2city.json --name UDA-gta --resume /saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta/checkpoint-iter95000.pth | tee ./gta-corda.log

It would run again but the new checkpoint would be saved.

qinenergy commented 3 years ago

Hi. The training skeleton is directly from DACS, we didn't test the resume function. We trained the model uninterrupted for 250000 iterations.
For your specific use case, maybe this can help:

change "--resume /saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta/checkpoint-iter95000.pth" to "--resume ../saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta/checkpoint-iter95000.pth" as the default save folder is one level up. The new checkpoints should show up in ../saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta-resume/

We didn't test this and maybe it is easier to train from scratch for 250000 to reproduce the results. Please let me know if you have further questions.

ygjwd12345 commented 3 years ago

I find the error causing by if args.resume: checkpoint_dir = os.path.join(*args.resume.split('/')[:-1]) + '_resume-' + start_writeable else: checkpoint_dir = os.path.join(config['utils']['checkpoint_dir'], start_writeable + '-' + args.name) I remove ` if args.resume: checkpoint_dir = os.path.join(*args.resume.split('/')[:-1]) + '_resume-' + start_writeable else: The problem is solved.