kohya-ss / sd-scripts

Apache License 2.0
5.31k stars 880 forks source link

The “Windows Installation” procedure in the readme is out of date and causes an error. #1659

Closed sig-ume closed 1 month ago

sig-ume commented 1 month ago

In the readme “Windows Installation”, The following commands are listed in the readme However, they are out of date and can no longer be installed as is.

PS C:\WINDOWS\system32> pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu118 Looking in indexes: https://download.pytorch.org/whl/cu118 ERROR: Could not find a version that satisfies the requirement torch==2.1.2 (from versions: 2.2.0+cu118, 2.2.1+cu118, 2.2.2+cu118, 2.3.0+cu118, 2.3.1+cu118, 2.4.0+cu118, 2.4.1+cu118) ERROR: No matching distribution found for torch==2.1.2

I am trying to proceed with the installation with a different torch version, but I am concerned about compatibility. If you don't mind, could you please update to the latest installation procedure?

kohya-ss commented 1 month ago

The code will probably work in 2.2 and later. Also, the sd3 branch is using 2.4, so I expect to be able to merge it in the near future.

However, the official PyTorch website provides the following installation instructions, so I think this may be a temporary problem: https://pytorch.org/get-started/previous-versions/

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

sig-ume commented 1 month ago

(開発者様が日本人だと伺ったので以降日本語で記載させていただきます) 回答ありがとうございます。 元々、以下のコマンドでsd-scriptsを実行しエラーがでていたのでtorchなどのバージョン不一致の可能性があるかと思い質問させていただいておりました。 コマンド accelerate launch --num_cpu_threads_per_process 1 train_network.py --pretrained_model_name_or_path=C:\StablilityMatrix\Data\Models\StableDiffusion\animagineXLV31_v31.safetensors --output_dir=C:\git\sd-scripts\TrainingData\outputs --output_name=taiyo_style --dataset_config=C:\git\sd-scripts\TrainingData\datasetconfig.toml --train_batch_size=1 --max_train_epochs=10 --resolution=512,512 --optimizer_type=AdamW8bit --learning_rate=1e-4 --network_dim=128 --network_alpha=64 --enable_bucket --bucket_no_upscale --lr_scheduler=cosine_with_restarts --lr_scheduler_num_cycles=4 --lr_warmup_steps=500 --keep_tokens=1 --shuffle_caption --caption_dropout_rate=0.05 --save_model_as=safetensors --clip_skip=2 --seed=42 --color_aug --xformers --mixed_precision=fp16 --network_module=networks.lora --persistent_data_loader_workers --v2 --v_parameterization

エラー内容

 size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
.....

エラー内容は #344 と同じようでしたので、 "--v2 --v_parameterization"をコマンドに付与したのですが、残念ながらエラーは変わりませんでした。 これは、使用しているsafetensorsにsd-scriptsが対応していないということなのでしょうか?

お忙しいところ恐縮ですが、ご教示いただけますと幸いです

kohya-ss commented 1 month ago

--pretrained_model_name_or_pathに指定されているモデルがanimagineXLV31_v31ですので、SDXLのLoRA用の学習スクリプトを用いる必要があります。train_network.pyではなくsdxl_train_network.pyをお使いください。よろしくお願いいたします。

sig-ume commented 1 month ago

ありがとうございます!学習開始できました!