kohya-ss / sd-scripts

Apache License 2.0
5.23k stars 866 forks source link

Text encoder of SD 1.5 model is not trained which is not supposed to happen #855

Open FurkanGozukara opened 1 year ago

FurkanGozukara commented 1 year ago

Here the executed command

accelerate launch --num_cpu_threads_per_process=2 "./train_db.py" --pretrained_model_name_or_path="/workspace/stable-diffusion-webui/models/Stable-diffusion/Realistic_Vision_V5.1.safetensors" --train_data_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/img" --reg_data_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/reg" --resolution="768,768" --output_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/model" --logging_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/log" --save_model_as=safetensors --full_bf16 --output_name="me_1e7" --lr_scheduler_num_cycles="4" --max_data_loader_n_workers="0" --learning_rate="1e-07" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="4160" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0

When text encoder is not trained it is supposed to print Text Encoder is not trained.

This message is not printed either

    train_text_encoder = args.stop_text_encoder_training is None or args.stop_text_encoder_training >= 0
    unet.requires_grad_(True)  # 念のため追加
    text_encoder.requires_grad_(train_text_encoder)
    if not train_text_encoder:
        accelerator.print("Text Encoder is not trained.")

So how do I know text encoder were not trained? Because I extracted LoRA and it says text encoder is same

I did 30 trainings and so many trainings are wasted because of this bug :/

image

@kohya-ss

FurkanGozukara commented 1 year ago

I manually set train text encoder true and added --stop_text_encoder_training 999999

But still lora extractor is saying text encoder is same

kohya-ss commented 1 year ago

I could reproduce the issue with same and some other settings.

I also trained with the previous version, tag v0.6.6, and Text Encoder is trained. train_db.py is almost identical in both version, so I think the most likely cause is one or some of dependent libraries. I will check it sooner. However, since it means that there is probably nothing wrong with train_db.py, it may take some time to find the cause.

FurkanGozukara commented 1 year ago

I could reproduce the issue with same and some other settings.

I also trained with the previous version, tag v0.6.6, and Text Encoder is trained. train_db.py is almost identical in both version, so I think the most likely cause is one or some of dependent libraries. I will check it sooner. However, since it means that there is probably nothing wrong with train_db.py, it may take some time to find the cause.

Thank you so much looking forward to solution. I am pretty sure those transformers diffusers or accelerator one of them broken

You are doing incredible job

kohya-ss commented 1 year ago

I hope so too, but if there is something wrong with my script, I apologize.

Thank you so much looking forward to solution. I am pretty sure those transformers diffusers or accelerator one of them broken

I hope so too, but if there is something wrong with my script, I apologize.

FurkanGozukara commented 1 year ago

I hope so too, but if there is something wrong with my script, I apologize.

Thank you so much looking forward to solution. I am pretty sure those transformers diffusers or accelerator one of them broken

I hope so too, but if there is something wrong with my script, I apologize.

SDXL text encoder is also not trained

image

FurkanGozukara commented 1 year ago

sadly no version of SDXL is training text encoder :(

I couldn't find working version with bmaltais/kohya_ss

edit : 3 months old sdxl branch working for some reason

bluvoll commented 1 year ago

sadly no version of SDXL is training text encoder :(

I couldn't find working version with bmaltais/kohya_ss

edit : 3 months old sdxl branch working for some reason

Just add --train_text_encoder as extra parameter and it will train the TE I think this is the intended behavior, as for extracting the lora from the Dreambooth if the TE has been trained enough to be different it will be extracted, but you can force the extraction by changing the value here (Kohya GUI but you can specify it in the command line no worries imagen

You'll then get this and it will be extracted as expected

imagen

FurkanGozukara commented 1 year ago

sadly no version of SDXL is training text encoder :( I couldn't find working version with bmaltais/kohya_ss edit : 3 months old sdxl branch working for some reason

Just add --train_text_encoder as extra parameter and it will train the TE I think this is the intended behavior, as for extracting the lora from the Dreambooth if the TE has been trained enough to be different it will be extracted, but you can force the extraction by changing the value here (Kohya GUI but you can specify it in the command line no worries imagen

You'll then get this and it will be extracted as expected

imagen

i tested with 0.01 and 0.004 both same

learning rate 1e-5 4160 steps

still same

image

image

when i make 0.0001 it shows very tiny difference but this seems to me wrong

image

FurkanGozukara commented 1 year ago

I will test with adding train_text_encoder command too ty

FurkanGozukara commented 1 year ago

by the way difference of Stable Diffusion 1.5 is also very any ideas?

it is 0.0009 - 4160 steps 1e-6 LR

i am using adafactor

FurkanGozukara commented 1 year ago

I am testing realistic vision 2 on ShivamShriraoDreamBooth colab

I wonder how much text encoder difference it will have

very low LR 4e-7 - 2080 steps

kohya-ss commented 1 year ago

I have tested with my dataset, AdamW 8bit optimizer, various learning rates. I found:

So I believe the scripts and the libraries are fine. However, I don't know why the same settings as before would produce different training results for Text Encoder.

I wrote another script to compare Text Encoder weights. You will find embeddings.token_embedding, some norm weights and biases have a large difference than attention. The LoRA extracting script only take care of attn layers, so the script determines two Text Encoders are same.

import argparse
import torch
from safetensors.torch import load_file

parser = argparse.ArgumentParser()
parser.add_argument("model1", help="path to model1")
parser.add_argument("model2", help="path to model2")
parser.add_argument("--rtol", type=float, default=1e-8, help="relative tolerance")
parser.add_argument("--atol", type=float, default=1e-6, help="absolute tolerance")
parser.add_argument("--bf16", action="store_true", help="use bf16 instead of fp32")
args = parser.parse_args()

model1_path = args.model1
model2_path = args.model2

# Load safetensors or checkpoint from each model path
print("loading models...")
if model1_path.endswith(".safetensors"):
    model1_sd = load_file(model1_path)
else:
    model1_sd = torch.load(model1_path)
if model2_path.endswith(".safetensors"):
    model2_sd = load_file(model2_path)
else:
    model2_sd = torch.load(model2_path)

if "state_dict" in model1_sd:
    model1_sd = model1_sd["state_dict"]
if "state_dict" in model2_sd:
    model2_sd = model2_sd["state_dict"]

# Compare the weights of each model
prefix_to_compare = "cond_stage_model"
print("comparing weights...")
print(f"key,\tall_close,\tmax_diff,\tmean_diff,\tmax_value1,\tmin_value1")
for key in model1_sd.keys():
    if key.startswith(prefix_to_compare):
        if key not in model2_sd:
            print(f"*** Key {key} not found in model2")
            continue
        if model1_sd[key].dtype == torch.long:
            # doesn't compare position ids
            # diff = torch.sum(model1_sd[key] != model2_sd[key])
            # print(f"*** {key}: long, {diff} different values")
            continue
        model1_value = model1_sd[key]
        model2_value = model2_sd[key]
        if args.bf16:
            model1_value = model1_value.to(torch.bfloat16)
            model2_value = model2_value.to(torch.bfloat16)
        model1_value = model1_value.to(torch.float32)
        model2_value = model2_value.to(torch.float32)

        all_close = torch.allclose(model1_value, model2_value, rtol=args.rtol, atol=args.atol)
        diff = torch.abs(model1_sd[key] - model2_sd[key])
        print(
            f"{key},\t{all_close},\t{torch.max(diff)},\t{torch.mean(diff)},\t{torch.max(model1_sd[key])},\t{torch.min(model1_sd[key])}"
        )
FurkanGozukara commented 1 year ago

@kohya-ss thank you so much

can we say that setting higher text encoder learning rate can be more beneficial in this case?

can we give already different LR for text encoder when doing SD 1.5 or SDXL training?

bluvoll commented 1 year ago

@kohya-ss thank you so much

can we say that setting higher text encoder learning rate can be more beneficial in this case?

can we give already different LR for text encoder when doing SD 1.5 or SDXL training?

afaik it doesn't have a way to specifiy LR for TE.

AIEXAAA commented 1 year ago

I may have found the problem, which can be divided into two parts:

1.The initial loss values of SD1.5 training are different, which is related to line 1047 in library\model_util.py. If we change

   # logging.set_verbosity_error()  # don't show annoying warning
   # text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device)
   # logging.set_verbosity_warning()
   # print(f"config: {text_model.config}")
   cfg = CLIPTextConfig(
       vocab_size=49408,
       hidden_size=768,
       intermediate_size=3072,
       num_hidden_layers=12,
       num_attention_heads=12,
       max_position_embeddings=77,
       hidden_act="quick_gelu",
       layer_norm_eps=1e-05,
       dropout=0.0,
       attention_dropout=0.0,
       initializer_range=0.02,
       initializer_factor=1.0,
       pad_token_id=1,
       bos_token_id=0,
       eos_token_id=2,
       model_type="clip_text_model",
       projection_dim=768,
       torch_dtype="float32",
   )
   text_model = CLIPTextModel._from_config(cfg)

back to

    logging.set_verbosity_error()  # don't show annoying warning
    text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device)
    logging.set_verbosity_warning()
    print(f"config: {text_model.config}")
    # cfg = CLIPTextConfig(
    #     vocab_size=49408,
    #     hidden_size=768,
    #     intermediate_size=3072,
    #     num_hidden_layers=12,
    #     num_attention_heads=12,
    #     max_position_embeddings=77,
    #     hidden_act="quick_gelu",
    #     layer_norm_eps=1e-05,
    #     dropout=0.0,
    #     attention_dropout=0.0,
    #     initializer_range=0.02,
    #     initializer_factor=1.0,
    #     pad_token_id=1,
    #     bos_token_id=0,
    #     eos_token_id=2,
    #     model_type="clip_text_model",
    #     projection_dim=768,
    #     torch_dtype="float32",
    # )
    # text_model = CLIPTextModel._from_config(cfg)

, the initial values will be the same.

2.The training process of SD1.5 is different, which is related to line 228 in train_network.py. If we delete the following two lines, the training process will be the same:

     if torch.__version__ >= "2.0.0":  # PyTorch 2.0.0 以上対応のxformersなら以下が使える
         vae.set_use_memory_efficient_attention_xformers(args.xformers)
bluvoll commented 1 year ago

sadly no version of SDXL is training text encoder :( I couldn't find working version with bmaltais/kohya_ss edit : 3 months old sdxl branch working for some reason

Just add --train_text_encoder as extra parameter and it will train the TE I think this is the intended behavior, as for extracting the lora from the Dreambooth if the TE has been trained enough to be different it will be extracted, but you can force the extraction by changing the value here (Kohya GUI but you can specify it in the command line no worries imagen You'll then get this and it will be extracted as expected imagen

i tested with 0.01 and 0.004 both same

learning rate 1e-5 4160 steps

still same

image

image

when i make 0.0001 it shows very tiny difference but this seems to me wrong

image

I had to use 0.000015 LR for it to show differences in about 8k steps, so its very slow, but the extracted lora had a working TE and behaved as expected.

timoshishi commented 1 year ago

sadly no version of SDXL is training text encoder :(

I couldn't find working version with bmaltais/kohya_ss

edit : 3 months old sdxl branch working for some reason

Can you provide the commit hash for the working branch?

FurkanGozukara commented 1 year ago

sadly no version of SDXL is training text encoder :( I couldn't find working version with bmaltais/kohya_ss edit : 3 months old sdxl branch working for some reason

Can you provide the commit hash for the working branch?

i think it was mistaken but not sure. i will do more research

this is the branch : https://github.com/bmaltais/kohya_ss/tree/sdxl-dev

kohya-ss commented 1 year ago

can we say that setting higher text encoder learning rate can be more beneficial in this case?

I don't think so. I think the learning rate for Text Encoder should be lower than the learning rate for U-Net in general.

can we give already different LR for text encoder when doing SD 1.5 or SDXL training?

Unfortunately, it is impossible for SD 1.5. For SDXL, we can use --block_lr option. It specifies 23 values of the learning rate for each U-Net block, like --block_lr 1e-4,2e-4,3e-4,4e-4,5e-4,6e-4,7e-4,8e-4,9e-4,0e-4,1e-5,2e-5,3e-5,4e-5,5e-5,6e-5,7e-5,8e-5,9e-5,0e-4,1e-4,2e-4,3e-4.

So if we set this option, the default learning rate is used for Text Encoder.

FurkanGozukara commented 1 year ago

@kohya-ss ty

my text encoder enabled training is about to be completed for SDXL with

--train_text_encoder

with this command it is using exactly same VRAM is this expected?

but it is slower like 32%

1 more question

DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality

You don't have that feature?

IdiotSandwichTheThird commented 1 year ago

@kohya-ss ty

my text encoder enabled training is about to be completed for SDXL with

--train_text_encoder

with this command it is using exactly same VRAM is this expected?

but it is slower like 32%

1 more question

DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality

You don't have that feature?

It sounds like you'd enjoy this repo for training more, it has adjustable lr for text/unet, EMA, masked training, etc. https://github.com/Nerogar/OneTrainer

FurkanGozukara commented 1 year ago

@kohya-ss ty my text encoder enabled training is about to be completed for SDXL with --train_text_encoder with this command it is using exactly same VRAM is this expected? but it is slower like 32% 1 more question DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality You don't have that feature?

It sounds like you'd enjoy this repo for training more, it has adjustable lr for text/unet, EMA, masked training, etc. https://github.com/Nerogar/OneTrainer

thanks i should experiment and compare

FurkanGozukara commented 1 year ago

@kohya-ss anyway to set LR for text encoder?

it super fast get cooked :D

https://twitter.com/GozukaraFurkan/status/1710416135747748150

IdiotSandwichTheThird commented 1 year ago

@kohya-ss Not sure if you've noticed, but I just tried extracting a lora from 2 models, which I know for sure have different trained text encoders, and I still got the above "Text Encoder is same" message. I can furthermore confirm the text encoders are different, because each will produce a different image when loaded in comfyUI, see: https://i.imgur.com/xoQpxWo.png

Therefore I think the most likely issue lies simply with extract_lora_from_models.py erroneously thinking the two models are the same.

Edit: More testing; I have edited extract_lora_from_models.py to always pass true for text encoder different.

    # Text Encoder might be same
    #if not text_encoder_different and torch.max(torch.abs(diff)) > MIN_DIFF:
    text_encoder_different = True
    print(f"Forcing use of text encoder. {torch.max(torch.abs(diff))} > {MIN_DIFF}")

The resulting lora works way better than before: https://i.imgur.com/VChzcw6.jpeg Left is with skipped TE extract, right with the above modification. The right image is way closer to the style of the trained model.

kohya-ss commented 1 year ago

my text encoder enabled training is about to be completed for SDXL with

--train_text_encoder

with this command it is using exactly same VRAM is this expected?

but it is slower like 32%

--train_text_encoder option should increate VRAM usage. But I have less experience for training Text Encoders. It is needed to check the result.

DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality

You don't have that feature?

Unfortuntaly, there is no EMA feature currently. I would like to support it, but I think other tasks have higher priority. Of course you can use another trainer :)

kohya-ss commented 1 year ago

@kohya-ss anyway to set LR for text encoder?

it super fast get cooked :D

As I mentioned on X, we can use --block_lr option to set LRs for each U-Net block. The default learning rate is used to Text Encoder.

kohya-ss commented 1 year ago

More testing; I have edited extract_lora_from_models.py to always pass true for text encoder different.

I modified to increase MIN_DIFF before, but it seems to be too large. I will add an option to set MIN_DIFF sooner.

FurkanGozukara commented 1 year ago

@kohya-ss

I used --block_lr and it works. text encoder not anymore cooked. here some comparisons

https://twitter.com/GozukaraFurkan/status/1710580153665925179

https://twitter.com/GozukaraFurkan/status/1710582243742142532

https://twitter.com/GozukaraFurkan/status/1710609957626810825

kohya-ss commented 1 year ago

I used --block_lr and it works. text encoder not anymore cooked. here some comparisons

That's nice! I didn't know the prompt for images, but I feel the right image might represent well the prompt, for example the style and the background.

mykeehu commented 1 year ago

I found it difficult to follow the dialogue, because there are other things at stake. Has the Text Encoder problem been fixed under SD 1.5 or not?

AIEXAAA commented 1 year ago

I found it difficult to follow the dialogue, because there are other things at stake. Has the Text Encoder problem been fixed under SD 1.5 or not?

The issue I tested still exists in the new version, and the LoRa trained cannot be used. You can try the modifications I mentioned earlier, it may be useful to you.

mykeehu commented 1 year ago

The issue I tested still exists in the new version, and the LoRa trained cannot be used. You can try the modifications I mentioned earlier, it may be useful to you.

I compared the config, and there was only one line difference: torch_dtype="float32" instead of torch_dtype=null. I guess the part in train_network.py is because of torch 2.0, and that's why it was changed from null to float32 in the config. I don't have any other idea, because I guess the two are related.

I'm now using version 21.8.4 of the GUI, which @FurkanGozukara claims still had good training for SD 1.5 (and I did make good Lora's with it), and it already had the parameters you describe, so it's more likely that the bug is elsewhere.

AIEXAAA commented 1 year ago

The issue I tested still exists in the new version, and the LoRa trained cannot be used. You can try the modifications I mentioned earlier, it may be useful to you.

I compared the config, and there was only one line difference: torch_dtype="float32" instead of torch_dtype=null. I guess the part in train_network.py is because of torch 2.0, and that's why it was changed from null to float32 in the config. I don't have any other idea, because I guess the two are related.

I'm now using version 21.8.4 of the GUI, which @FurkanGozukara claims still had good training for SD 1.5 (and I did make good Lora's with it), and it already had the parameters you describe, so it's more likely that the bug is elsewhere.

I’m not sure where the problem lies, but you might be right.

For me, the so-called correctness is to reproduce the training results of SD1.5 before introducing SDXL. I found that when the author does not quote “openai/clip-vit-large-patch14”, the initial loss function of training will be different. And when the author later introduces

    if torch.__version__ >= "2.0.0":  # PyTorch 2.0.0 以上対応のxformersなら以下が使える
        vae.set_use_memory_efficient_attention_xformers(args.xformers)

the trained SD1.5 lora will be completely damaged.

As for what you said about torch_dtype=“float32” , at this time we have already abandoned the reference to “openai/clip-vit-large-patch14”, and the training results are already different from before.

FurkanGozukara commented 1 year ago

i am not sure but SDXL training is far superior atm

here you can see my pictures : i shared 180+ : https://civitai.com/user/SECourses

best config : https://www.patreon.com/posts/89213064

quick tutorial : https://www.youtube.com/watch?v=EEV8RPohsbw

mykeehu commented 1 year ago

@AIEXAAA I looked at this link and matched the parameters to the ones in the .py file and only that one line is different. Since torch 2.x has been made the default in the new kohya versions, I assume there is a correlation. https://huggingface.co/openai/clip-vit-large-patch14/blob/main/config.json

@FurkanGozukara thanks, but I want to train SD 1.5, not SDXL.

FurkanGozukara commented 1 year ago

@AIEXAAA I looked at this link and matched the parameters to the ones in the .py file and only that one line is different. Since torch 2.x has been made the default in the new kohya versions, I assume there is a correlation. https://huggingface.co/openai/clip-vit-large-patch14/blob/main/config.json

@FurkanGozukara thanks, but I want to train SD 1.5, not SDXL.

for sd 1.5 i am still in research

my older tutorial still working great though since it has EMA support too

https://youtu.be/g0wXIcRhkJk

AIEXAAA commented 1 year ago

@AIEXAAA I looked at this link and matched the parameters to the ones in the .py file and only that one line is different. Since torch 2.x has been made the default in the new kohya versions, I assume there is a correlation. https://huggingface.co/openai/clip-vit-large-patch14/blob/main/config.json

I think I roughly understand what you’re saying, and when

“print(f"config: {text_model.config}”)

is displayed, the value is consistent with the author’s default cfg, but the problem still results in different outcomes.

As for the PyTorch issue, even if I update to 2.0 or 2.01, or even update this training program to the latest version, as long as I modify it in the way I mentioned earlier, then the results of SD1.5 lora training will be consistent with before introducing SDXL. Therefore, it’s hard to assert that it is related to PyTorch 2.0.

kohya-ss commented 1 year ago

I think this issue is already solved, but #890 seems to exists. I will work on #890.

mykeehu commented 1 year ago

I'm glad you found the source of the problem. Looking forward to the fix! :)

WarAnakin commented 1 year ago

Hello everyone, I am pretty glad to see someone finally was able to identify this issue. I am the creator and founder of Team Crystal Clear. Some of you might be familiar with the name, to others it may be new. This is an issue i myself brought up in august when I first trained Crystal Clear XL. Unfortunately at the time, everyone I mentioned the fact that I am unable to properly train on kohya due to faulty text encoders, I was dismissed and told this is a me related issue. So, presented with no other choice, my team and I got to work and fixed the issue so we can properly train Crystal Clear XL. It's not in my nature to make available broken releases given how most of the work we do are commissions for game developers, the automotive industry, stable diffusion service providers, the brands and apparel industry, instagram and only fans influencers and models, and many other various businesses. This means that pretty much every checkpoint other than CCXL was trained on broken text encoders since august. Now, I don't have the time to go through all this comments to know if the issue is fixed or not, but i'm looking forward to see how the future changes compare to the ones we made. And as for kohya, you might not remember, but i did bring this up to you on the civitai discord back in august.

FurkanGozukara commented 1 year ago

Hello everyone, I am pretty glad to see someone finally was able to identify this issue. I am the creator and founder of Team Crystal Clear. Some of you might be familiar with the name, to others it may be new. This is an issue i myself brought up in august when I first trained Crystal Clear XL. Unfortunately at the time, everyone I mentioned the fact that I am unable to properly train on kohya due to faulty text encoders, I was dismissed and told this is a me related issue. So, presented with no other choice, my team and I got to work and fixed the issue so we can properly train Crystal Clear XL. It's not in my nature to make available broken releases given how most of the work we do are commissions for game developers, the automotive industry, stable diffusion service providers, the brands and apparel industry, instagram and only fans influencers and models, and many other various businesses. This means that pretty much every checkpoint other than CCXL was trained on broken text encoders since august. Now, I don't have the time to go through all this comments to know if the issue is fixed or not, but i'm looking forward to see how the future changes compare to the ones we made. And as for kohya, you might not remember, but i did bring this up to you on the civitai discord back in august.

I am also doing training for companies. So far only using UNET training. Results are great but after text encoder I am hoping we will get even better results

WarAnakin commented 1 year ago

Hello everyone, I am pretty glad to see someone finally was able to identify this issue. I am the creator and founder of Team Crystal Clear. Some of you might be familiar with the name, to others it may be new. This is an issue i myself brought up in august when I first trained Crystal Clear XL. Unfortunately at the time, everyone I mentioned the fact that I am unable to properly train on kohya due to faulty text encoders, I was dismissed and told this is a me related issue. So, presented with no other choice, my team and I got to work and fixed the issue so we can properly train Crystal Clear XL. It's not in my nature to make available broken releases given how most of the work we do are commissions for game developers, the automotive industry, stable diffusion service providers, the brands and apparel industry, instagram and only fans influencers and models, and many other various businesses. This means that pretty much every checkpoint other than CCXL was trained on broken text encoders since august. Now, I don't have the time to go through all this comments to know if the issue is fixed or not, but i'm looking forward to see how the future changes compare to the ones we made. And as for kohya, you might not remember, but i did bring this up to you on the civitai discord back in august.

I am also doing training for companies. So far only using UNET training. Results are great but after text encoder I am hoping we will get even better results

It's great to meet you Furkan. I've always found the research you do and the dedication you have towards stable diffusion, nothing short of outstanding. You are a wonderful content maker and I fully support and recommend your work.

BrennenRB commented 12 months ago

I have been struggling with the faulty text encoder for the last few weeks, and was hoping that it would be fixed with the November 11 [v21.1.1] update, but that does not seem to be the case. I am still getting "Text encoder is same. Extract U-Net only." when extracting LoRAs. Is anyone else having this problem? Found workarounds? Know when it will be fixed?

FurkanGozukara commented 12 months ago

I have been struggling with the faulty text encoder for the last few weeks, and was hoping that it would be fixed with the November 11 [v21.1.1] update, but that does not seem to be the case. I am still getting "Text encoder is same. Extract U-Net only." when extracting LoRAs. Is anyone else having this problem? Found workarounds? Know when it will be fixed?

i am using bmaltais GUI dev2 branch and working great SDXL training

mykeehu commented 12 months ago

SD 1.5 TE is still not good for Lora training. Yesterday I tried the same training under 21.8.4 GUI and 22.1.1 (with updated kohya script) and got completely different results, in the latest version it was overcooked by the third epoch, while in 21.8.4 I got a perfect Lora.

BrennenRB commented 12 months ago

I have been struggling with the faulty text encoder for the last few weeks, and was hoping that it would be fixed with the November 11 [v21.1.1] update, but that does not seem to be the case. I am still getting "Text encoder is same. Extract U-Net only." when extracting LoRAs. Is anyone else having this problem? Found workarounds? Know when it will be fixed?

i am using bmaltais GUI dev2 branch and working great SDXL training

Are you using Dreambooth or Finetune in the dev2 branch?

suede299 commented 12 months ago

I trained loha on sdxl with the last two updates, tried various parameters and always had a hard time getting satisfactory results, could be due to a couple things.

  1. an optimizer that can automatically determine the learning rate and can't specify different learning rates for te and unet. 2.train network.py cannot set Stop text encoder training.
  2. load LoRA network weights does not load Lycoris correctly.
  3. sdxl's two text encoders are not separated. Wanted to train the missing character expressions from the sdxl base model into the same lora, so chose loha, but couldn't actually get anything that worked at all.
DarkAlchy commented 11 months ago

error: unrecognized arguments: --train_text_encoder Apparently Kohya has removed this for 1.5 training and when the model for Dreambooth is only 2GB you know it does not have the TE when the model it trained from is 4.7GB.

FurkanGozukara commented 11 months ago

sd 1.5 trains by default TE

DarkAlchy commented 11 months ago

Didn't for me, but I don't use 1.5 since 2.0 was released, just had to use it to help LyCORIS test something.