Closed ygjwd12345 closed 3 years ago
Hi.
The training skeleton is directly from DACS, we didn't test the resume function. We trained the model uninterrupted for 250000 iterations.
For your specific use case, maybe this can help:
change "--resume /saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta/checkpoint-iter95000.pth" to "--resume ../saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta/checkpoint-iter95000.pth" as the default save folder is one level up. The new checkpoints should show up in ../saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta-resume/
We didn't test this and maybe it is easier to train from scratch for 250000 to reproduce the results. Please let me know if you have further questions.
I find the error causing by
if args.resume: checkpoint_dir = os.path.join(*args.resume.split('/')[:-1]) + '_resume-' + start_writeable else: checkpoint_dir = os.path.join(config['utils']['checkpoint_dir'], start_writeable + '-' + args.name)
I remove
` if args.resume: checkpoint_dir = os.path.join(*args.resume.split('/')[:-1]) + '_resume-' + start_writeable else:
The problem is solved.
when I use script llike
CUDA_VISIBLE_DEVICES=0 python3 -u trainUDA_gta.py --config ./configs/configUDA_gta2city.json --name UDA-gta --resume /saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta/checkpoint-iter95000.pth | tee ./gta-corda.log
It would run again but the new checkpoint would be saved.