Open Mitchnoff opened 1 year ago
Due to the character limit I had to remove a lot of the error message. It is posted in its entirety here in case it is needed:
Additionally, here is the script used to train the diffusion model:
@Mitchnoff, I get the same error. Could you let me know if you found a solution?
@Mitchnoff, I get the same error. Could you let me know if you found a solution?
I've tried working on 64x64 images to start but am still facing the same issue. I will definitely update this if a solution is found. What happened on your side to get the same results? Were you trying to train on your own dataset?
@Mitchnoff, I get the same error. Have you found a solution?
I get the same error. Have you found a solution?
@shengshneg123 @HioZx @Mitchnoff @anicej has anyone found the solution yet?
I get the same error. Have you found a solution?
I am attempting to train a guided diffusion model on my own custom training data. I have successfully trained a model using the improved-diffusion github on my custom data and was able to sample images that fit the training data. I am now attempting to train a classifer on the same data and use it for the guided diffusion process. When I run my shell script, however, I get the following error:
Error Message
``` creating model and diffusion... Traceback (most recent call last): File "scripts/classifier_sample.py", line 134, inThe scripts used for sampling, training the diffusion model, and training the classifier
This is the script that fails and gets the message above.
Image_sample.sh
``` #!/bin/bash MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 256 --learn_sigma True --num_channels 256 --num_heads 4 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True" CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 4 --classifier_width 32 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --classifier_scale 1.0 --classifier_use_fp16 True" # CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --classifier_scale 1.0 --classifier_use_fp16 True" SAMPLE_FLAGS="--batch_size 4 --num_samples 50000 --timestep_respacing ddim25 --use_ddim True" export OPENAI_LOGDIR="~/diffusion/guided-diffusion/outputs/first_output" # fix the model path and classifier path python scripts/classifier_sample.py \ --model_path ~/path/to/model010000.pt \ --classifier_path ~/path/to/model020000.pt \ $MODEL_FLAGS $CLASSIFIER_FLAGS $SAMPLE_FLAGSThis is the script used to generate the diffusion model:
train.sh
#!/bin/bash MODEL_FLAGS="--image_size 256 --num_channels 128 --num_res_blocks 3 --class_cond True" DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear" TRAIN_FLAGS="--lr 1e-4 --batch_size 4" export OPENAI_LOGDIR="~/diffusion/improved-diffusion/training_logs/256_classcond/" python scripts/image_train.py \ --data_dir /data/path/to/improved_diffusion_data \ $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS ```This is the script used to generate the classifier:
classifier_train.sh
``` #!/bin/bash TRAIN_FLAGS="--iterations 300000 --anneal_lr True --batch_size 4 --lr 3e-4 --save_interval 10000 --weight_decay 0.05" CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 32 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True" export OPENAI_LOGDIR="~/diffusion/guided-diffusion/trained_classifiers/256_class_test" python scripts/classifier_train.py \ --data_dir /data/ur/berisha/mitch/improved_diffusion_data \ $TRAIN_FLAGS $CLASSIFIER_FLAGS ```Based on the error message I believe it has to do with mismatched architecture. I am attempting to retrain with different hyperparemetrs but am uncertain which ones could be causing this problem. Admittedly I am not sure that this is even the cause, so it very well may be something else causing this problem.
Any pointers in the right direction would be greatly appreciated. I am reading over the paper again and watching some videos to see if that sheds some light as to how this problem is occuring. IF any more information is needed to help let me know!