Open FurkanGozukara opened 7 months ago
@kohya-ss
currently it uses 15.7 GB minimum on Kaggle
So with P100 gpu it works but that means people can't use much faster T4 and Kaggle gives dual T4
Also who has 16 GB GPUs can't use properly either
With these options, Text Encoder 2 is trained with the learning rate=1e-5, because --train_text_encoder
is specified. I think OneTrainer may train Text Encoder 1 only. If you want to stop Text Encoder 2 training, please specify --learning_rate_te2=0
.
With these options, Text Encoder 2 is trained with the learning rate=1e-5, because
--train_text_encoder
is specified. I think OneTrainer may train Text Encoder 1 only. If you want to stop Text Encoder 2 training, please specify--learning_rate_te2=0
.
wow this is a bug in that case because this is what bmaltais gui generates - i will report him will test thank you and reply back here
so when we don't provide TE2 what does trainer uses? because this is a big problem for me
yep i verified this bug exists and breaks my config :/
thank you so much Kohya
Hey, I am encountering the same problem today!!
I have two cloned codes of sd-scripts. One was cloned in 12,2023, and the other was downloaded today.
But I found the new code always reported"out of memory" by using the same configuration as follows:
--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0 \ --vae=madebyollin/sdxl-vae-fp16-fix \ --dataset_config=/home/lyh/sdvs/sd-scripts/config/finetune.toml \ --output_dir=/home/lyh/sd-scripts/output/finetune_15W \ --output_name=finetune_15W \ --save_model_as=safetensors \ --save_every_n_epochs=1 \ --save_precision="fp16" \ --max_token_length=225 \ --min_timestep=0 \ --max_timestep=1000 \ --max_train_epochs=2000 \ --learning_rate=4e-6 \ --lr_scheduler="constant" \ --optimizer_type="AdamW8bit" \ --xformers \ --gradient_checkpointing \ --gradient_accumulation_steps=128 \ --mem_eff_attn \ --mixed_precision="fp16" \ --logging_dir=logs \
The wired thing comes: the VRAM occupation with new code:
the VRAM occupation with old code:
why? where is different?
As I mentioned in #1141, multiple GPU issue seems to have another reason.
Hey, I am encountering the same problem today!! I have two cloned codes of sd-scripts. One was cloned in 12,2023, and the other was downloaded today. But I found the new code always reported"out of memory" by using the same configuration as follows:
--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0 \ --vae=madebyollin/sdxl-vae-fp16-fix \ --dataset_config=/home/lyh/sdvs/sd-scripts/config/finetune.toml \ --output_dir=/home/lyh/sd-scripts/output/finetune_15W \ --output_name=finetune_15W \ --save_model_as=safetensors \ --save_every_n_epochs=1 \ --save_precision="fp16" \ --max_token_length=225 \ --min_timestep=0 \ --max_timestep=1000 \ --max_train_epochs=2000 \ --learning_rate=4e-6 \ --lr_scheduler="constant" \ --optimizer_type="AdamW8bit" \ --xformers \ --gradient_checkpointing \ --gradient_accumulation_steps=128 \ --mem_eff_attn \ --mixed_precision="fp16" \ --logging_dir=logs \
The wired thing comes: the VRAM occupation with new code:
the VRAM occupation with old code:
why? where is different?
same problem
I have a config which was running on Kaggle fine in previous versions
Right now it is failing on 15 GB gpu
This should not happen
Same settings on OneTrainer uses lesser than 13.5 GB VRAM
Here it fails with 15 GB
It wasn't failing before
All images are 1024x1024 All cached
Here the full training used prompt
I did trainings in past in Kaggle and this exact prompt was working i even have a video of it here
https://youtu.be/16-b1AjvyBE