kohya-ss / sd-scripts

Apache License 2.0
5.11k stars 852 forks source link

SDXL LoRA training, OutOfMemoryError #648

Closed lifeisboringsoprogramming closed 1 year ago

lifeisboringsoprogramming commented 1 year ago

I manage to run the sdxl_train_network.py with twenty 512x512 images, repeat 27 times

my training toml as follow:

pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-0.9"
caption_extension = ".txt"
resolution = "1024,1024"
cache_latents = true
enable_bucket = true
bucket_no_upscale = true
output_dir = "/home/lifeisboring/training/sdxl/model"
output_name = "libsp"
save_precision = "fp16"
save_every_n_epochs = 1
train_batch_size = 2
max_token_length = 225
xformers = true
max_train_epochs = 6
persistent_data_loader_workers = true
gradient_checkpointing = true
mixed_precision = "fp16"
logging_dir = "/home/lifeisboring/training/sdxl/log"
sample_every_n_epochs = 1
sample_prompts = "/home/lifeisboring/training/sdxl/prompt.txt"
sample_sampler = "euler_a"
optimizer_type = "AdamW8bit"
learning_rate = 0.0005
lr_scheduler = "cosine_with_restarts"
lr_warmup_steps = 162
lr_scheduler_num_cycles = 6
dataset_config = "/home/lifeisboring/training/sdxl/dataset_config.toml"
unet_lr = 0.0005
text_encoder_lr = 0.0001
network_module = "networks.lora"
network_dim = 16
network_alpha = 8

I have 12G VRAM only I got the following error How much VRAM do I need for LoRA training for SDXL? Thank you very much

    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.76 GiB total capacity; 9.60 GiB already allocated; 52.50 MiB free; 9.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):

Full logs as follow:

(sdscripts) @amd:/home/lifeisboring/sd-scripts$ accelerate launch --num_cpu_threads_per_process=2 "sdxl_train_network.py" --config_file=/home/lifeisboring/training/sdxl/train_network_config.toml --no_half_vae
2023-07-15 00:38:29.413181: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-15 00:38:29.551999: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-07-15 00:38:29.988808: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-07-15 00:38:29.988849: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-07-15 00:38:29.988854: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-07-15 00:38:31.703858: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-15 00:38:31.853379: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-07-15 00:38:32.286116: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-07-15 00:38:32.286156: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-07-15 00:38:32.286162: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Loading settings from /home/lifeisboring/training/sdxl/train_network_config.toml...
/home/lifeisboring/training/sdxl/train_network_config
prepare tokenizers
update token length: 225
Loading dataset config from /home/lifeisboring/training/sdxl/dataset_config.toml
prepare images.
found directory /home/lifeisboring/training/dataset/27_libsp woman contains 20 image files
540 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 2
  resolution: (1024, 1024)
  enable_bucket: True
  min_bucket_reso: 256
  max_bucket_reso: 1024
  bucket_reso_steps: 64
  bucket_no_upscale: True

  [Subset 0 of Dataset 0]
    image_dir: "/home/lifeisboring/training/dataset/27_libsp woman"
    image_count: 20
    num_repeats: 27
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: libsp woman
    caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 2728.09it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 512), count: 540
mean ar error (without repeats): 0.0
noise_offset is set to 0.0357 / noise_offsetが0.0357に設定されました
preparing accelerator
loading model for process 0/1
load Diffusers pretrained models: stabilityai/stable-diffusion-xl-base-0.9, variant=fp16
Couldn't connect to the Hub: 401 Client Error. (Request ID: Root=1-64b24cfa-6d4eb05d0cc850e2677b169d)

Repository Not Found for url: https://huggingface.co/api/models/stabilityai/stable-diffusion-xl-base-0.9.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password..
Will try to load from local cache.
U-Net converted to original U-Net
Enable xformers for U-Net
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 640351.76it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.48it/s]
create LoRA network. base dim (rank): 16, alpha: 8
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder 1:
create LoRA for Text Encoder 2:
create LoRA for Text Encoder: 264 modules.
create LoRA for U-Net: 722 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:93: UserWarning: /home/lifeisboring/anaconda3/envs/sdscripts did not contain libcudart.so as expected! Searching further paths...
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/cv2/../../lib64')}
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:105: UserWarning: /home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/cv2/../../lib64: did not contain libcudart.so as expected! Searching further paths...
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('local/amd'), PosixPath('@/tmp/.ICE-unix/1790,unix/amd')}
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/share/gconf/ubuntu.mandatory.path')}
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/share/gconf/ubuntu.default.path')}
  warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
use 8-bit AdamW optimizer | {}
override steps. steps for 6 epochs is / 指定エポックまでのステップ数: 1620
Traceback (most recent call last):
  File "/home/lifeisboring/sd-scripts/sdxl_train_network.py", line 167, in <module>
    trainer.train(args)
  File "/home/lifeisboring/sd-scripts/train_network.py", line 365, in train
    unet, t_enc1, t_enc2, network, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/accelerator.py", line 1143, in prepare
    result = tuple(
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/accelerator.py", line 1144, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/accelerator.py", line 995, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/accelerator.py", line 1218, in prepare_model
    model = model.to(self.device)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 4 more times]
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.00 MiB (GPU 0; 11.76 GiB total capacity; 9.59 GiB already allocated; 59.69 MiB free; 9.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
  File "/home/lifeisboring/anaconda3/envs/sdscripts/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/commands/launch.py", line 918, in launch_command
    simple_launcher(args)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/commands/launch.py", line 580, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/lifeisboring/anaconda3/envs/sdscripts/bin/python', 'sdxl_train_network.py', '--config_file=/home/lifeisboring/training/sdxl/train_network_config.toml', '--no_half_vae']' returned non-zero exit status 1.
lifeisboringsoprogramming commented 1 year ago

私は sdxl_train_network.py を実行することができました 512x512 の画像を20枚使用し、27回繰り返しました

私のトレーニングの toml は次のようになります:

pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-0.9"
caption_extension = ".txt"
resolution = "1024,1024"
cache_latents = true
enable_bucket = true
bucket_no_upscale = true
output_dir = "/home/lifeisboring/training/sdxl/model"
output_name = "libsp"
save_precision = "fp16"
save_every_n_epochs = 1
train_batch_size = 2
max_token_length = 225
xformers = true
max_train_epochs = 6
persistent_data_loader_workers = true
gradient_checkpointing = true
mixed_precision = "fp16"
logging_dir = "/home/lifeisboring/training/sdxl/log"
sample_every_n_epochs = 1
sample_prompts = "/home/lifeisboring/training/sdxl/prompt.txt"
sample_sampler = "euler_a"
optimizer_type = "AdamW8bit"
learning_rate = 0.0005
lr_scheduler = "cosine_with_restarts"
lr_warmup_steps = 162
lr_scheduler_num_cycles = 6
dataset_config = "/home/lifeisboring/training/sdxl/dataset_config.toml"
unet_lr = 0.0005
text_encoder_lr = 0.0001
network_module = "networks.lora"
network_dim = 16
network_alpha = 8

私はVRAMが12Gしかありません 以下のエラーが表示されます SDXLのLoRAトレーニングにはどれくらいのVRAMが必要ですか? ありがとうございます

    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.76 GiB total capacity; 9.60 GiB already allocated; 52.50 MiB free; 9.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):

完全なログは以下の通りです:

(sdscripts) @amd:/home/lifeisboring/sd-scripts$ accelerate launch --num_cpu_threads_per_process=2 "sdxl_train_network.py" --config_file=/home/lifeisboring/training/sdxl/train_network_config.toml --no_half_vae
2023-07-15 00:38:29.413181: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-15 00:38:29.551999: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-07-15 00:38:29.988808: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-07-15 00:38:29.988849: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-07-15 00:38:29.988854: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-07-15 00:38:31.703858: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-15 00:38:31.853379: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-07-15 00:38:32.286116: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-07-15 00:38:32.286156: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-07-15 00:38:32.286162: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Loading settings from /home/lifeisboring/training/sdxl/train_network_config.toml...
/home/lifeisboring/training/sdxl/train_network_config
prepare tokenizers
update token length: 225
Loading dataset config from /home/lifeisboring/training/sdxl/dataset_config.toml
prepare images.
found directory /home/lifeisboring/training/dataset/27_libsp woman contains 20 image files
540 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 2
  resolution: (1024, 1024)
  enable_bucket: True
  min_bucket_reso: 256
  max_bucket_reso: 1024
  bucket_reso_steps: 64
  bucket_no_upscale: True

  [Subset 0 of Dataset 0]
    image_dir: "/home/lifeisboring/training/dataset/27_libsp woman"
    image_count: 20
    num_repeats: 27
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: libsp woman
    caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 2728.09it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 512), count: 540
mean ar error (without repeats): 0.0
noise_offset is set to 0.0357 / noise_offsetが0.0357に設定されました
preparing accelerator
loading model for process 0/1
load Diffusers pretrained models: stabilityai/stable-diffusion-xl-base-0.9, variant=fp16
Couldn't connect to the Hub: 401 Client Error. (Request ID: Root=1-64b24cfa-6d4eb05d0cc850e2677b169d)

Repository Not Found for url: https://huggingface.co/api/models/stabilityai/stable-diffusion-xl-base-0.9.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password..
Will try to load from local cache.
U-Net converted to original U-Net
Enable xformers for U-Net
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 640351.76it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.48it/s]
create LoRA network. base dim (rank): 16, alpha: 8
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder 1:
create LoRA for Text Encoder 2:
create LoRA for Text Encoder: 264 modules.
create LoRA for U-Net: 722 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:93: UserWarning: /home/lifeisboring/anaconda3/envs/sdscripts did not contain libcudart.so as expected! Searching further paths...
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/cv2/../../lib64')}
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:105: UserWarning: /home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/cv2/../../lib64: did not contain libcudart.so as expected! Searching further paths...
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('local/amd'), PosixPath('@/tmp/.ICE-unix/1790,unix/amd')}
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/share/gconf/ubuntu.mandatory.path')}
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
  warn(
/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/share/gconf/ubuntu.default.path')}
  warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
use 8-bit AdamW optimizer | {}
override steps. steps for 6 epochs is / 指定エポックまでのステップ数: 1620
Traceback (most recent call last):
  File "/home/lifeisboring/sd-scripts/sdxl_train_network.py", line 167, in <module>
    trainer.train(args)
  File "/home/lifeisboring/sd-scripts/train_network.py", line 365, in train
    unet, t_enc1, t_enc2, network, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/accelerator.py", line 1143, in prepare
    result = tuple(
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/accelerator.py", line 1144, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/accelerator.py", line 995, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/accelerator.py", line 1218, in prepare_model
    model = model.to(self.device)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 4 more times]
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.00 MiB (GPU 0; 11.76 GiB total capacity; 9.59 GiB already allocated; 59.69 MiB free; 9.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
  File "/home/lifeisboring/anaconda3/envs/sdscripts/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/commands/launch.py", line 918, in launch_command
    simple_launcher(args)
  File "/home/lifeisboring/anaconda3/envs/sdscripts/lib/python3.10/site-packages/accelerate/commands/launch.py", line 580, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/lifeisboring/anaconda3/envs/sdscripts/bin/python', 'sdxl_train_network.py', '--config_file=/home/lifeisboring/training/sdxl/train_network_config.toml', '--no_half_vae']' returned non-zero exit status 1.
sdbds commented 1 year ago

batch_size set 1 or try new pr with Paged optimizer

lifeisboringsoprogramming commented 1 year ago

@sdbds thanks, I can train with batch size 1 now