mit-han-lab / data-efficient-gans

[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
https://arxiv.org/abs/2006.10738
BSD 2-Clause "Simplified" License
1.27k stars 175 forks source link

Resume training progress and see images created every tick #67

Closed iumyx2612 closed 3 years ago

iumyx2612 commented 3 years ago

I am training my custom dataset with run_low_shot.py on Google Colab. How can I resume my training progress and see the image created every tick. I saw the training_loop.py scripts but I don't know how to implement it

zsyzzsoft commented 3 years ago

Use --resume=CHECKPOINT_PATH to resume; you can directly modify this line to 1.

iumyx2612 commented 3 years ago

what exactly is the CHECKPOINT_PATH? I saw 2 new file in my dataset, one .tfrecords file and one .pkl file. And about the training_loop.py script, do I just call it directly with arguments or is it called by another script?

zsyzzsoft commented 3 years ago

*.tfrecords is the dataset file; *.pkl is the checkpoint file. You do not need to call training_loop.py, just modify it and then run run_low_shot.py.

iumyx2612 commented 3 years ago

*.tfrecords is the dataset file; *.pkl is the checkpoint file. You do not need to call training_loop.py, just modify it and then run run_low_shot.py.

Thanks, I get it now

iumyx2612 commented 3 years ago

I ran the run_low_shot.py with --resume="the/path/to/my/datasets/*.pkl"
other arguments of run_low_shot.py are: --DiffAugment="" --num-gpus=1 --batch-size=8 --resolution=64 --fmap-base=16384 --datasets="path/to/my/datasets"

ERROR REPORT

Traceback (most recent call last): File "run_low_shot.py", line 171, in main() File "run_low_shot.py", line 165, in main run(vars(args)) File "run_low_shot.py", line 94, in run dnnlib.submit_run(kwargs) File "/content/drive/My Drive/graduationthesis/data-efficient-gans/DiffAugment-stylegan2/dnnlib/submission/submit.py", line 343, in submit_run return farm.submit(submit_config, host_run_dir) File "/content/drive/My Drive/graduationthesis/data-efficient-gans/DiffAugment-stylegan2/dnnlib/submission/internal/local.py", line 22, in submit return run_wrapper(submit_config) File "/content/drive/My Drive/graduationthesis/data-efficient-gans/DiffAugment-stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper run_func_obj(**submit_config.run_func_kwargs) File "/content/drive/My Drive/graduationthesis/data-efficient-gans/DiffAugment-stylegan2/training/training_loop.py", line 160, in training_loop rG, rD, rGs = resume_networks ValueError: not enough values to unpack (expected 3, got 2)

zsyzzsoft commented 3 years ago

Oh, *.pkl should not be in your dataset folder but in the experiment folder (usually inside a folder called results).

iumyx2612 commented 3 years ago

My training folder does't have any .pkl files, it contains only __pycache__ folder and scripts of yours. Also, I took a look at .stylegan2-cache folder and I can't find the .pkl for my dataset

zsyzzsoft commented 3 years ago

Sorry I meant the folder called results. If there are no *.pkl in the results folder, that means your training did not reach the progress of saving a checkpoint.

iumyx2612 commented 3 years ago

Great! Thank you so much, really appreciated