kohya-ss / sd-scripts

Apache License 2.0
5.13k stars 854 forks source link

NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. #1665

Open Wolchenok57 opened 2 weeks ago

Wolchenok57 commented 2 weeks ago

A new undistilled version of flux.1 dev that may be easier to train (https://huggingface.co/nyanko7/flux-dev-de-distill ; https://huggingface.co/MinusZoneAI/flux-dev-de-distill-fp8), raised error: NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

I tested this on fluxgym, comfyui-flux-trainer and flux branch of sd-scripts and it is the same between them all. Full traceback with launch (my env file is in fluxgym folder, i do not have infinite nvme ssd):

(env) I:\neurostuff\kohya_ss(f1)>accelerate launch ^ Продолжить? --mixed_precision bf16 ^ Продолжить? --num_cpu_threads_per_process 1 ^ Продолжить? sd-scripts/flux_train_network.py ^ Продолжить? --pretrained_model_name_or_path "I:\neurostuff\ComfyUI\models\unet\FLUX1\consolidated_s6700_fp8.safetensors" ^ Продолжить? --clip_l "I:\neurostuff\ComfyUI\models\clip\clip_l.safetensors" ^ Продолжить? --t5xxl "I:\neurostuff\ComfyUI\models\clip\t5xxl_fp16.safetensors" ^ Продолжить? --ae "I:\neurostuff\ComfyUI\models\vae\FLUX1\ae.sft" ^ Продолжить? --cache_latents_to_disk ^ Продолжить? --save_model_as safetensors ^ Продолжить? --sdpa --persistent_data_loader_workers ^ Продолжить? --max_data_loader_n_workers 2 ^ Продолжить? --seed 42 ^ Продолжить? --gradient_checkpointing ^ Продолжить? --mixed_precision bf16 ^ Продолжить? --save_precision bf16 ^ Продолжить? --network_module networks.lora_flux ^ Продолжить? --network_dim 16 ^ Продолжить? --network_alpha 16.0 ^ Продолжить? --optimizer_type adafactor ^ Продолжить? --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" ^ Продолжить? --lr_scheduler constant_with_warmup ^ Продолжить? --max_grad_norm 0.0 ^ Продолжить? --learning_rate 8e-4 ^ Продолжить? --cache_text_encoder_outputs ^ Продолжить? --cache_text_encoder_outputs_to_disk ^ Продолжить? --fp8_base ^ Продолжить? --highvram ^ Продолжить? --max_train_epochs 1 ^ Продолжить? --save_every_n_epochs 1 ^ Продолжить? --dataset_config "I:\neurostuff\fluxgym\outputs\megafignya\dataset.toml" ^ Продолжить? --output_dir "I:\neurostuff\fluxgym\outputs\megafignya" ^ Продолжить? --output_name megafignya ^ Продолжить? --timestep_sampling shift ^ Продолжить? --discrete_flow_shift 3.1582 ^ Продолжить? --model_prediction_type raw ^ Продолжить? --guidance_scale 1 ^ Продолжить? --loss_type l2 ^ Продолжить? --apply_t5_attn_mask ^ Продолжить? --weighting_scheme logit_normal ^ Продолжить? --logit_mean 0.0 ^ Продолжить? --logit_std 1.0 ^ Продолжить? --mode_scale 1.29 ^ Продолжить? --sigmoid_scale ^ 1.0 ^ Продолжить? --enable_bucket ^ Продолжить? --bucket_no_upscale ^ Продолжить? --min_bucket_reso 256 ^ Продолжить? --max_bucket_reso 1024 highvram is enabled / highvramが有効です 2024-10-03 18:20:21 WARNING cache_latents_to_disk is enabled, so cache_latents is also enabled / train_util.py:3895 cache_latents_to_diskが有効なため、cache_latentsを有効にします 2024-10-03 18:20:21 INFO t5xxl_max_token_length: 512 flux_train_network.py:144 I:\neurostuff\fluxgym\env\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 2024-10-03 18:20:22 INFO Loading dataset config from train_network.py:270 I:\neurostuff\fluxgym\outputs\megafignya\dataset.toml INFO prepare images. train_util.py:1803 INFO get image size from name of cache files train_util.py:1741 100%|█████████████████████████████████████████████████████████████████████████████| 1148/1148 [00:09<00:00, 121.06it/s] 2024-10-03 18:20:32 INFO set image size from cache files: 1148/1148 train_util.py:1748 INFO found directory I:\neurostuff\DATASET[REDACTED] contains train_util.py:1750 1148 image files INFO 1148 train images with repeating. train_util.py:1844 INFO 0 reg images. train_util.py:1847 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1852 INFO [Dataset 0] config_util.py:570 batch_size: 1 resolution: (512, 512) enable_bucket: True network_multiplier: 1.0 min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: True [Subset 0 of Dataset 0] image_dir: "I:\neurostuff\DATASET[REDACTED]" image_count: 1148 num_repeats: 1 shuffle_caption: False keep_tokens: 1 keep_tokens_separator: caption_separator: , secondary_separator: None enable_wildcard: False caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, alpha_mask: False, is_reg: False class_tokens: None caption_extension: .txt INFO [Dataset 0] config_util.py:576 INFO loading image sizes. train_util.py:876 100%|██████████████████████████████████████████████████████████████████████████████████████| 1148/1148 [00:00<?, ?it/s] INFO make buckets train_util.py:882 WARNING min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is train_util.py:899 set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計 算されるため、min_bucket_resoとmax_bucket_resoは無視されます INFO number of images (including repeats) / train_util.py:928 各bucketの画像枚数(繰り返し回数を含む) INFO bucket 0: resolution (128, 512), count: 3 train_util.py:933 INFO bucket 1: resolution (192, 512), count: 1 train_util.py:933 INFO bucket 2: resolution (256, 512), count: 38 train_util.py:933 INFO bucket 3: resolution (320, 512), count: 310 train_util.py:933 INFO bucket 4: resolution (384, 512), count: 94 train_util.py:933 INFO bucket 5: resolution (448, 448), count: 1 train_util.py:933 INFO bucket 6: resolution (448, 512), count: 30 train_util.py:933 INFO bucket 7: resolution (512, 192), count: 3 train_util.py:933 INFO bucket 8: resolution (512, 256), count: 104 train_util.py:933 INFO bucket 9: resolution (512, 320), count: 118 train_util.py:933 INFO bucket 10: resolution (512, 384), count: 65 train_util.py:933 INFO bucket 11: resolution (512, 448), count: 37 train_util.py:933 INFO bucket 12: resolution (512, 512), count: 344 train_util.py:933 INFO mean ar error (without repeats): 0.06703154647887108 train_util.py:938 INFO network for CLIP-L only will be trained. T5XXL will not be trained flux_train_network.py:50 / CLIP-Lのネットワークのみが学習されます。T5XXLは学習されません INFO preparing accelerator train_network.py:335 accelerator device: cuda INFO Building Flux model dev flux_utils.py:45 INFO Loading state dict from flux_utils.py:52 I:\neurostuff\ComfyUI\models\unet\FLUX1\consolidated_s6700_fp8.safetensors INFO Loaded Flux: flux_utils.py:55 _IncompatibleKeys(missing_keys=['guidance_in.in_layer.weight', 'guidance_in.in_layer.bias', 'guidance_in.out_layer.weight', 'guidance_in.out_layer.bias'], unexpected_keys=[]) INFO Loaded fp8 FLUX model flux_train_network.py:80 INFO Building CLIP flux_utils.py:74 INFO Loading state dict from flux_utils.py:167 I:\neurostuff\ComfyUI\models\clip\clip_l.safetensors INFO Loaded CLIP: flux_utils.py:170 INFO Loading state dict from flux_utils.py:213 I:\neurostuff\ComfyUI\models\clip\t5xxl_fp16.safetensors INFO Loaded T5xxl: flux_utils.py:216 INFO Building AutoEncoder flux_utils.py:62 INFO Loading state dict from I:\neurostuff\ComfyUI\models\vae\FLUX1\ae.sft flux_utils.py:66 INFO Loaded AE: flux_utils.py:69 import network module: networks.lora_flux INFO [Dataset 0] train_util.py:2326 INFO caching latents with caching strategy. train_util.py:984 INFO checking cache validity... train_util.py:994 100%|███████████████████████████████████████████████████████████████████████████| 1148/1148 [00:00<00:00, 12551.45it/s] 2024-10-03 18:20:33 INFO no latents to cache train_util.py:1034 INFO move vae and unet to cpu to save memory flux_train_network.py:187 Traceback (most recent call last): File "I:\neurostuff\kohya_ss(f1)\sd-scripts\flux_train_network.py", line 446, in trainer.train(args) File "I:\neurostuff\kohya_ss(f1)\sd-scripts\train_network.py", line 392, in train self.cache_text_encoder_outputs_if_needed(args, accelerator, unet, vae, text_encoders, train_dataset_group, weight_dtype) File "I:\neurostuff\kohya_ss(f1)\sd-scripts\flux_train_network.py", line 191, in cache_text_encoder_outputs_if_needed unet.to("cpu") File "I:\neurostuff\fluxgym\env\lib\site-packages\torch\nn\modules\module.py", line 1340, in to return self._apply(convert) File "I:\neurostuff\fluxgym\env\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "I:\neurostuff\fluxgym\env\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "I:\neurostuff\fluxgym\env\lib\site-packages\torch\nn\modules\module.py", line 927, in _apply param_applied = fn(param) File "I:\neurostuff\fluxgym\env\lib\site-packages\torch\nn\modules\module.py", line 1333, in convert raise NotImplementedError( NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. Traceback (most recent call last): File "C:\Python\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Python\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "I:\neurostuff\fluxgym\env\Scripts\accelerate.exe__main__.py", line 7, in File "I:\neurostuff\fluxgym\env\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "I:\neurostuff\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "I:\neurostuff\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['I:\neurostuff\fluxgym\env\Scripts\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'I:\neurostuff\ComfyUI\models\unet\FLUX1\consolidated_s6700_fp8.safetensors', '--clip_l', 'I:\neurostuff\ComfyUI\models\clip\clip_l.safetensors', '--t5xxl', 'I:\neurostuff\ComfyUI\models\clip\t5xxl_fp16.safetensors', '--ae', 'I:\neurostuff\ComfyUI\models\vae\FLUX1\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '16', '--network_alpha', '16.0', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '1', '--save_every_n_epochs', '1', '--dataset_config', 'I:\neurostuff\fluxgym\outputs\megafignya\dataset.toml', '--output_dir', 'I:\neurostuff\fluxgym\outputs\megafignya', '--output_name', 'megafignya', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2', '--apply_t5_attn_mask', '--weighting_scheme', 'logit_normal', '--logit_mean', '0.0', '--logit_std', '1.0', '--mode_scale', '1.29', '--sigmoid_scale', '1.0', '--enable_bucket', '--bucket_no_upscale', '--min_bucket_reso', '256', '--max_bucket_reso', '1024']' returned non-zero exit status 1.

Default flux.1 dev fp8 works. And my 16 GB 4070 TiS handle training well. But with this new model, it makes this error. Is there anything that can be done to modify the code, or is it incompatible at all?

li771667481 commented 1 week ago

Did you find a way?

chris9-0 commented 1 week ago

I am facing the same problem.

agwosdz commented 5 days ago

Per related question ([https://github.com/bmaltais/kohya_ss/issues/2885]), it has been added in latest sd-script (SD3 branch).

kohya-ss commented 5 days ago

This should be solved in the latest version, if you are still having issues please share the logs and the file of the model you are using.