ostris / ai-toolkit

Various AI scripts. Mostly Stable Diffusion stuff.
MIT License
220 stars 25 forks source link

Error #6

Closed PotatoBananaApple closed 10 months ago

PotatoBananaApple commented 10 months ago

Trying to run the LoRA Slider Trainer example on windows gives me this error

(venv) PS C:\ai-toolkit> python run.py config/examples/train_slider.example.yml
Running 1 job
{
    "type": "slider",
    "network": {
        "type": "lierla",
        "linear": 8,
        "linear_alpha": 4
    },
    "train": {
        "noise_scheduler": "ddpm",
        "steps": 500,
        "lr": 0.0002,
        "gradient_checkpointing": true,
        "train_unet": true,
        "train_text_encoder": false,
        "min_snr_gamma": 5.0,
        "optimizer": "adamw",
        "lr_scheduler": "constant",
        "max_denoising_steps": 40,
        "batch_size": 1,
        "dtype": "bf16",
        "noise_offset": 0.0
    },
    "model": {
        "name_or_path": "runwayml/stable-diffusion-v1-5",
        "is_v2": false,
        "is_v_pred": false,
        "is_xl": false
    },
    "save": {
        "dtype": "float16",
        "save_every": 50,
        "max_step_saves_to_keep": 2
    },
    "sample": {
        "sampler": "ddpm",
        "sample_every": 20,
        "width": 512,
        "height": 512,
        "prompts": [
            "a woman in a coffee shop, black hat, blonde hair, blue jacket --m -5",
            "a woman in a coffee shop, black hat, blonde hair, blue jacket --m -3",
            "a woman in a coffee shop, black hat, blonde hair, blue jacket --m 3",
            "a woman in a coffee shop, black hat, blonde hair, blue jacket --m 5",
            "a golden retriever sitting on a leather couch, --m -5",
            "a golden retriever sitting on a leather couch --m -3",
            "a golden retriever sitting on a leather couch --m 3",
            "a golden retriever sitting on a leather couch --m 5",
            "a man with a beard and red flannel shirt, wearing vr goggles, walking into traffic --m -5",
            "a man with a beard and red flannel shirt, wearing vr goggles, walking into traffic --m -3",
            "a man with a beard and red flannel shirt, wearing vr goggles, walking into traffic --m 3",
            "a man with a beard and red flannel shirt, wearing vr goggles, walking into traffic --m 5"
        ],
        "neg": "cartoon, fake, drawing, illustration, cgi, animated, anime, monochrome",
        "seed": 42,
        "walk_seed": false,
        "guidance_scale": 7,
        "sample_steps": 20,
        "network_multiplier": 1.0
    },
    "logging": {
        "log_every": 10,
        "use_wandb": false,
        "verbose": false
    },
    "slider": {
        "resolutions": [
            [
                512,
                512
            ]
        ],
        "batch_full_slide": true,
        "targets": [
            {
                "target_class": "",
                "positive": "high detail, 8k, intricate, detailed, high resolution, high res, high quality",
                "negative": "blurry, boring, fuzzy, low detail, low resolution, low res, low quality",
                "weight": 1.0,
                "shuffle": true
            }
        ]
    }
}

#############################################
# Running job: detail_slider_v1
#############################################

Running  1 process
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:04<00:00,  1.71it/s]
Error running job: 'bool' object has no attribute '__module__'

========================================
Result:
 - 0 completed jobs
 - 1 failure
========================================
Traceback (most recent call last):
  File "C:\ai-toolkit\run.py", line 75, in <module>
    main()
  File "C:\ai-toolkit\run.py", line 71, in main
    raise e
  File "C:\ai-toolkit\run.py", line 63, in main
    job.run()
  File "C:\ai-toolkit\jobs\TrainJob.py", line 50, in run
    process.run()
  File "C:\ai-toolkit\jobs\process\BaseSDTrainProcess.py", line 236, in run
    self.sd.load_model()
  File "C:\ai-toolkit\toolkit\stable_diffusion_model.py", line 191, in load_model
    pipe = pipln.from_pretrained(
  File "C:\ai-toolkit\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1170, in from_pretrained
    model = pipeline_class(**init_kwargs)
  File "C:\ai-toolkit\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py", line 183, in __init__
    self.register_modules(
  File "C:\ai-toolkit\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 513, in register_modules
    library = not_compiled_module.__module__.split(".")[0]
AttributeError: 'bool' object has no attribute '__module__'. Did you mean: '__mod__'?
PotatoBananaApple commented 10 months ago

It started working after i messed around with the config file.

jaretburkett commented 10 months ago

Do you happen to remember what you changed that made it work?

PotatoBananaApple commented 10 months ago

Sorry i don't know exactly what was causing the issue. I tried to troubleshoot little comparing 2 configs, 1 of them works and one of them does not work. This time i get this kind of error on the other one. I'll attach the configs if you want to have look. I can´t find what is causing the issue.

Running  1 process
create LoRA network. base dim (rank): 8, alpha: 4
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder: 0 modules.
create LoRA for U-Net: 128 modules.
enable LoRA for U-Net
Prompt tensors not found. Encoding prompts..
Error running job: 'NoneType' object has no attribute 'text_embeds'

========================================
Result:
 - 0 completed jobs
 - 1 failure
========================================
Traceback (most recent call last):
  File "C:\ai-toolkit\run.py", line 75, in <module>
    main()
  File "C:\ai-toolkit\run.py", line 71, in main
    raise e
  File "C:\ai-toolkit\run.py", line 63, in main
    job.run()
  File "C:\ai-toolkit\jobs\TrainJob.py", line 50, in run
    process.run()
  File "C:\ai-toolkit\jobs\process\BaseSDTrainProcess.py", line 344, in run
    self.hook_before_train_loop()
  File "C:\ai-toolkit\jobs\process\TrainSliderProcess.py", line 114, in hook_before_train_loop
    concat_prompt_pair_batch = concat_prompt_pairs(prompt_pair_batch).to('cpu')
  File "C:\ai-toolkit\toolkit\prompt_utils.py", line 88, in concat_prompt_pairs
    positive_target = concat_prompt_embeds([p.positive_target for p in prompt_pairs])
  File "C:\ai-toolkit\toolkit\prompt_utils.py", line 77, in concat_prompt_embeds
    text_embeds = torch.cat([p.text_embeds for p in prompt_embeds], dim=0)
  File "C:\ai-toolkit\toolkit\prompt_utils.py", line 77, in <listcomp>
    text_embeds = torch.cat([p.text_embeds for p in prompt_embeds], dim=0)
AttributeError: 'NoneType' object has no attribute 'text_embeds'

configs.zip

PotatoBananaApple commented 10 months ago

Another thing i was wondering, is it supposed to be this slow? I have rtx 3060 12gb


#############################################
# Running job: test1
#############################################

Running  1 process
create LoRA network. base dim (rank): 8, alpha: 4
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder: 0 modules.
create LoRA for U-Net: 128 modules.
enable LoRA for U-Net
Prompt tensors not found. Encoding prompts..
Generating baseline samples before training
test1:   0%|▏                                           | 2/500 [00:44<2:52:32, 20.79s/it, lr: 1.0e-04 loss: 1.041e-04]
 21%|█████████████████▊                                                                 | 6/28 [00:03<00:13,  1.60it/s]
jaretburkett commented 10 months ago

@PotatoBananaApple That seems very slow. I have a laptop with that same GPU (except the laptop version) Ill try it out on there to see. I suspect it is overflowing the ram. The new driver for windows won't crash, it just does dynamic CPU offloading which is SSSLLLLOOOOOOWWWWW. What kind of ram usage are you seeing? Are you sure it is utilizing the gpu?

PotatoBananaApple commented 10 months ago

@jaretburkett Hey, the vram does not hit the cap, plenty of room. It is using cuda +80% while training.

PotatoBananaApple commented 10 months ago

image image

Updated driver, did not change anything.

PotatoBananaApple commented 10 months ago

@jaretburkett Did clean install on sandbox, no interference from anything, same 1.5it-1.7it/s. Tried with cu118, similar speed.

What kind of speed you getting with your laptop?

hanggun commented 10 months ago

@jaretburkett I also have the same question, does 1.7it/s very slow? I have 2080ti and when I train a normal lora using sd-scripts, the speed is like 1.5-1.7it/s