huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.15k stars 5.2k forks source link

Pixart training #5951

Closed kopyl closed 8 months ago

kopyl commented 9 months ago

Describe the bug

I was trying to run PixArt training of 512x512 model following your tutorial, but got this error: {'clip_sample', 'clip_sample_range'} was not found in config. Values will be initialized to default values. (more info in the "logs" section)

What is pixart: https://github.com/PixArt-alpha/PixArt-alpha (you have from PixArtAlphaPipeline in the diffusers library)

Reproduction

  1. Installl:
!git clone https://github.com/huggingface/diffusers
%cd diffusers
!pip install .
%cd examples/text_to_image
!pip install -r requirements.txt
  1. Run training:
!accelerate launch --mixed_precision="fp16" train_text_to_image.py \
  --pretrained_model_name_or_path=PixArt-alpha/PixArt-XL-2-512x512 \
  --dataset_name=lambdalabs/pokemon-blip-captions \
  --use_ema \
  --resolution=512 \
  --train_batch_size=48 \
  --max_train_steps=50000 \
  --checkpointing_steps=50 \
  --learning_rate=1e-6 \
  --max_grad_norm=1 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --output_dir="/workspace/pixart-pokemon" \
  --noise_offset=0.05 \
  --cache_dir="/workspace/dataset-cache"

Logs

The following values were not passed to `accelerate launch` and had defaults used instead:
    `--num_processes` was set to a value of `1`
    `--num_machines` was set to a value of `1`
    `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
11/27/2023 21:45:18 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

{'clip_sample', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'T5Tokenizer'. 
The class this function is called from is 'CLIPTokenizer'.
Traceback (most recent call last):
  File "/workspace/diffusers/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
    main()
  File "/workspace/diffusers/diffusers/examples/text_to_image/train_text_to_image.py", line 552, in main
    tokenizer = CLIPTokenizer.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
    return cls._from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/clip/tokenization_clip.py", line 326, in __init__
    with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 994, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 636, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python', 'train_text_to_image.py', '--pretrained_model_name_or_path=PixArt-alpha/PixArt-XL-2-512x512', '--dataset_name=kopyl/833-icons-dataset-1024-blip-large', '--use_ema', '--resolution=512', '--train_batch_size=48', '--max_train_steps=50000', '--checkpointing_steps=50', '--learning_rate=1e-6', '--max_grad_norm=1', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--output_dir=/workspace/pixart-icons', '--noise_offset=0.05', '--cache_dir=/workspace/dataset-cache']' returned non-zero exit status 1.
Click to add a cell.

System Info

RTX 4090 24 GB VRAM

Who can help?

No response

patrickvonplaten commented 9 months ago

cc @sayakpaul

kopyl commented 9 months ago

@patrickvonplaten not really a bug, rather not implemented training with Diffusers pipeline.

But it would be very amazing to be able to train this fascinating cheap-to-train model with Diffusers.

I guess a script has to be written and put here so everyone can train (well, basically 2 scripts: process data and actual training . Or just one but separated using args like --stage="prepare" and --stage-"train")

sayakpaul commented 9 months ago

Yeah. Based on the community interest we will get back to it. But we might ship fine-tuning with LoRA soon (cc: @lawrence-cj).

kopyl commented 9 months ago

@sayakpaul thank you very much :)

surebert commented 9 months ago

Do you think we will be able to do Lora finetune of pixart with a 4090 or not enough VRAM?

kopyl commented 9 months ago

@surebert have no idea, i'm not the author of this package

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Qjizhi commented 2 months ago

Yeah. Based on the community interest we will get back to it. But we might ship fine-tuning with LoRA soon (cc: @lawrence-cj).

Hi, can we train Pixart in diffusers now?

Qjizhi commented 2 months ago

@patrickvonplaten not really a bug, rather not implemented training with Diffusers pipeline.

But it would be very amazing to be able to train this fascinating cheap-to-train model with Diffusers.

I guess a script has to be written and put here so everyone can train (well, basically 2 scripts: process data and actual training . Or just one but separated using args like --stage="prepare" and --stage-"train")

Hi, can we train Pixart in diffusers now?

sayakpaul commented 2 months ago

We don't have a training script for PixArt. Refer to https://github.com/bghira/SimpleTuner/ which provides support.