openai / improved-diffusion

Release for Improved Denoising Diffusion Probabilistic Models
MIT License
3.17k stars 480 forks source link

No checkpoint is saved. #94

Open mlszy928 opened 1 year ago

mlszy928 commented 1 year ago

Hi, thanks for the excellent work. I train the model on my own CT dataset with a RTX4090 GPU, the CT is converted to 3-channel gray maps. The trained code is as follows:

python scripts/image_train.py --data_dir E:/data/ddpm/png_kidney/img --image_size 256 --num_channels 128 --num_res_blocks 3 --learn_sigma True --diffusion_steps 4000 --noise_schedule cosine --use_kl True --lr 1e-4 --batch_size 4 --schedule_sampler loss-second-moment

But there is no checkpoints saved after training for one night, the training process is like image

Mirko1998 commented 1 year ago

Try to set save_interval to a smaller number in your training. This could help. I found my results under /tmp/openai-..... This helped me, if this doesnt work for you some of the officials of this site should help

666wodeyy commented 4 days ago

you can also set a dir to store your checkpoint use the follow code: export OPENAI_LOGDIR=dir_checkpoint