Closed YujiaKCL closed 3 months ago
requirements.txt
is updated. Could you please update the requirements with pip install --use-pep517 --upgrade -r requirements.txt
?
requirements.txt
is updated. Could you please update the requirements withpip install --use-pep517 --upgrade -r requirements.txt
?
Thanks for response. I have updated the dependencies but still not working. It seems the environment is fine but the T5 model structure is not correctly configured.
For T5, I use the t5xxl_fp16.safetensors from https://huggingface.co/stabilityai/stable-diffusion-3-medium/tree/main/text_encoders. Dislike Clip-l, transformers.T5EncoderModel does not have the "text_model" attribute but "encoder".
Could you please share the full stack trace of the error?
Could you please share the full stack trace of the error?
Error msg and commands (T5 printed):
accelerate launch --num_processes 1 --mixed_precision bf16 --num_cpu_threads_per_process 8 flux_train_network.py \
--pretrained_model_name_or_path /pfs/mt-BzjfJP/yx/projs/kohya_flux/models/FLUX.1-dev/flux1-dev.sft \
--clip_l /pfs/mt-BzjfJP/yx/projs/kohya_flux/models/FLUX.1-dev/clip_l.safetensors \
--t5xxl /pfs/mt-BzjfJP/yx/projs/kohya_flux/models/FLUX.1-dev/t5xxl_fp16.safetensors \
--ae /pfs/mt-BzjfJP/yx/projs/kohya_flux/models/FLUX.1-dev/ae.sft \
--resolution=1024,1024 \
--cache_latents_to_disk --cache_text_encoder_outputs_to_disk --save_model_as safetensors --sdpa \
--persistent_data_loader_workers --max_data_loader_n_workers 8 --seed 42 \
--gradient_checkpointing --mixed_precision bf16 --save_precision bf16 \
--network_module networks.lora_flux --network_dim 4 --optimizer_type adamw8bit --learning_rate 1e-4 \
--network_train_unet_only --fp8_base --highvram --max_train_epochs 4 --save_every_n_epochs 1 \
--output_dir outputs --output_name NAME --timestep_sampling sigmoid \
--model_prediction_type raw --guidance_scale 1.0 --loss_type l2 \
--train_data_dir testing_datasets/ds3 \
--output_dir outputs/test \
--logging_dir logs/test
The following values were not passed to accelerate launch
and had defaults used instead:
--num_machines
was set to a value of 1
--dynamo_backend
was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config
.
/pfs/mt-BzjfJP/yx/projs/kohya_flux/sd-scripts/venv/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: FutureWarning: torch.utils._pytree._register_pytree_node
is deprecated. Please use torch.utils._pytree.register_pytree_node
instead.
torch.utils._pytree._register_pytree_node(
/pfs/mt-BzjfJP/yx/projs/kohya_flux/sd-scripts/venv/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_fwd")
/pfs/mt-BzjfJP/yx/projs/kohya_flux/sd-scripts/venv/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_bwd")
highvram is enabled / highvramが有効です
2024-08-13 20:14:07 WARNING cache_latents_to_disk is enabled, so cache_latents is also enabled / cache_latents_to_diskが有効なため、cache_latentsを有効にします train_util.py:3899
/pfs/mt-BzjfJP/yx/projs/kohya_flux/sd-scripts/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces
was not set. It will be set to True
by default. This behavior will be depracted in transformers v4.45, and will be then set to False
by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2024-08-13 20:14:10 INFO Using DreamBooth method. train_network.py:276
WARNING ignore directory without repeats / 繰り返し回数のないディレクトリを無視します: .ipynb_checkpoints config_util.py:589
WARNING ignore directory without repeats / 繰り返し回数のないディレクトリを無視します: captions config_util.py:589
INFO prepare images. train_util.py:1807
INFO get image size from name of cache files train_util.py:1745
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 2048.73it/s]
INFO set image size from cache files: 100/100 train_util.py:1752
INFO found directory /pfs/mt-BzjfJP/yx/projs/kohya_flux/testing_datasets/ds3/5_动漫_新海诚风格 contains 100 image files train_util.py:1754
WARNING No caption file found for 100 images. Training will continue without captions for these images. If class token exists, it will be used. / train_util.py:1785
100枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。
WARNING /pfs/mt-BzjfJP/yx/projs/kohya_flux/testing_datasets/ds3/5image (1).jpg train_util.py:1792
WARNING /pfs/mt-BzjfJP/yx/projs/kohya_flux/testing_datasets/ds3/5image (10).jpg train_util.py:1792
WARNING /pfs/mt-BzjfJP/yx/projs/kohya_flux/testing_datasets/ds3/5image (100).jpg train_util.py:1792
WARNING /pfs/mt-BzjfJP/yx/projs/kohya_flux/testing_datasets/ds3/5image (11).jpg train_util.py:1792
WARNING /pfs/mt-BzjfJP/yx/projs/kohya_flux/testing_datasets/ds3/5image (12).jpg train_util.py:1792
WARNING /pfs/mt-BzjfJP/yx/projs/kohya_flux/testing_datasets/ds3/5image (13).jpg... and 95 more train_util.py:1790
INFO 500 train images with repeating. train_util.py:1848
INFO 0 reg images. train_util.py:1851
WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1856
INFO [Dataset 0] config_util.py:570
batch_size: 1
resolution: (1024, 1024)
enable_bucket: False
network_multiplier: 1.0
[Subset 0 of Dataset 0]
image_dir: "/pfs/mt-BzjfJP/yx/projs/kohya_flux/testing_datasets/ds3/5_images"
image_count: 100
num_repeats: 5
shuffle_caption: False
keep_tokens: 0
keep_tokens_separator:
caption_separator: ,
secondary_separator: None
enable_wildcard: False
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
alpha_mask: False,
is_reg: False
class_tokens: tok
caption_extension: .caption
INFO [Dataset 0] config_util.py:576
INFO loading image sizes. train_util.py:876
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 2129088.32it/s]
INFO prepare dataset train_util.py:884
INFO preparing accelerator train_network.py:329
accelerator device: cuda
INFO Building CLIP flux_utils.py:48
INFO Loading state dict from /pfs/mt-BzjfJP/yx/projs/kohya_flux/models/FLUX.1-dev/clip_l.safetensors flux_utils.py:141
2024-08-13 20:14:11 INFO Loaded CLIP:
Pip: absl-py 2.1.0 accelerate 0.33.0 aiohappyeyeballs 2.3.5 aiohttp 3.10.3 aiosignal 1.3.1 altair 4.2.2 async-timeout 4.0.3 attrs 24.2.0 bitsandbytes 0.43.3 certifi 2024.7.4 charset-normalizer 3.3.2 diffusers 0.25.0 easygui 0.98.3 einops 0.7.0 entrypoints 0.4 filelock 3.15.4 frozenlist 1.4.1 fsspec 2024.6.1 ftfy 6.1.1 grpcio 1.65.4 huggingface-hub 0.24.5 idna 3.7 imagesize 1.4.1 importlib_metadata 8.2.0 Jinja2 3.1.4 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 library 0.0.0 /pfs/mt-BzjfJP/yx/projs/kohya_flux/sd-scripts lightning-utilities 0.11.6 lion-pytorch 0.0.6 Markdown 3.6 markdown-it-py 3.0.0 MarkupSafe 2.1.5 mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.5 networkx 3.3 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.6.20 nvidia-nvtx-cu12 12.1.105 opencv-python 4.7.0.68 packaging 24.1 pandas 2.2.2 pillow 10.4.0 pip 21.2.3 prodigyopt 1.0 protobuf 4.25.4 psutil 6.0.0 Pygments 2.18.0 python-dateutil 2.9.0.post0 pytorch-lightning 1.9.0 pytz 2024.1 PyYAML 6.0.2 referencing 0.35.1 regex 2024.7.24 requests 2.32.3 rich 13.7.0 rpds-py 0.20.0 safetensors 0.4.2 sentencepiece 0.2.0 setuptools 57.4.0 six 1.16.0 sympy 1.13.2 tensorboard 2.17.0 tensorboard-data-server 0.7.2 tokenizers 0.19.1 toml 0.10.2 toolz 0.12.1 torch 2.4.0 torchmetrics 1.4.1 torchvision 0.19.0 tqdm 4.66.5 transformers 4.44.0 triton 3.0.0 typing_extensions 4.12.2 tzdata 2024.1 urllib3 2.2.2 voluptuous 0.13.1 wcwidth 0.2.13 Werkzeug 3.0.3 xformers 0.0.27.post2 yarl 1.9.4 zipp 3.20.0
It looks like you're using an old version of the source code. Please git pull to the latest version.
It looks like you're using an old version of the source code. Please git pull to the latest version.
It works now. The key is to ensure cache_text_encoder_outputs=True while only set cache_text_encoder_outputs_to_disk=True will cause error.
When cache_text_encoder_outputs=False and T5 is loaded on GPU, the following lines will cause error:
if hasattr(t_enc.text_model, "embeddings"):
# nn.Embedding not support FP8
t_enc.text_model.embeddings.to(dtype=(weight_dtype if te_weight_dtype != weight_dtype else te_weight_dtype))
Thank you for reporting the issue. I will set cache_text_encoder_outputs to True when cache_text_encoder_outputs_to_disk=True.
I updated the code. Thanks!
The flux inference script is fine, but the training script will cause the following error: AttributeError: 'T5EncoderModel' object has no attribute 'text_model'.
Does anyone encounter the same issue?