Closed chengyinglie closed 5 months ago
看样子应该是训练lora时vae latent的梯度没有开
在1100行拿到latent后加一句
latents.requires_grad_(True)
应该就可以了
请问是那个文件哈,train_text_to_image_lora.py 还是 app.py 呢?没找到啊,多谢
训练失败:mmcv这个软件包在linux如何安装? Process Process-1: Traceback (most recent call last): File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/utils/registry.py", line 210, in build_from_cfg return obj_cls._instantiate(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/models/base/base_model.py", line 67, in _instantiate return cls(kwargs) ^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/models/cv/face_detection/scrfd/damofd_detect.py", line 31, in init super().init(model_dir, **kwargs) File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/models/cv/face_detection/scrfd/scrfd_detect.py", line 33, in init from mmcv import Config ModuleNotFoundError: No module named 'mmcv'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/utils/registry.py", line 212, in build_from_cfg return obj_cls(args) ^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/pipelines/cv/face_detection_pipeline.py", line 36, in init super().init(model=model, kwargs) File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/pipelines/base.py", line 100, in init self.model = self.initiate_single_model(model, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/pipelines/base.py", line 53, in initiate_single_model return Model.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/models/base/base_model.py", line 183, in from_pretrained model = build_model(model_cfg, task_name=task_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/models/builder.py", line 35, in build_model model = build_from_cfg( ^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/utils/registry.py", line 215, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') ModuleNotFoundError: DamoFdDetect: No module named 'mmcv'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/saizong/stable-diffusion/extensions/facechain/facechain/inference.py", line 25, in _data_process_fn_process Blipv2()(input_img_dir) ^^^^^^^^ File "/home/saizong/stable-diffusion/extensions/facechain/facechain/data_process/preprocessing.py", line 207, in init self.face_detection = pipeline(task=Tasks.face_detection, model='damo/cv_ddsar_face-detection_iclr23-damofd', model_revision='v1.1') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/pipelines/builder.py", line 170, in pipeline return build_pipeline(cfg, task_name=task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/pipelines/builder.py", line 65, in build_pipeline return build_from_cfg( ^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/utils/registry.py", line 215, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') ModuleNotFoundError: FaceDetectionPipeline: DamoFdDetect: No module named 'mmcv'
I resovle the error by adding "loss.requires_grad = True" after line 1033 in train_text_to_image_lora.py
train_loss += avg_loss.item() / args.gradient_accumulation_steps
loss.requires_grad = True
I resovle the error by adding "loss.requires_grad = True" after line 1033 in train_text_to_image_lora.py
train_loss += avg_loss.item() / args.gradient_accumulation_steps loss.requires_grad = True
I added this, but a new problem happened.
Traceback (most recent call last):0%|████████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00, 2.09it/s]
File "/root/facechain-main/facechain/train_text_to_image_lora.py", line 1225, in
OK,thanks a lot!
I resovle the error by adding "loss.requires_grad = True" after line 1033 in train_text_to_image_lora.py
train_loss += avg_loss.item() / args.gradient_accumulation_steps loss.requires_grad = True
I added this, but a new problem happened. Traceback (most recent call last):0%|████████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00, 2.09it/s] File "/root/facechain-main/facechain/train_text_to_image_lora.py", line 1225, in main() File "/root/facechain-main/facechain/train_text_to_image_lora.py", line 1211, in main pipeline.unet.load_attn_procs(args.output_dir) File "/root/miniconda3/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/diffusers/loaders/unet.py", line 297, in load_attn_procs raise ValueError(f"Module {key} is not a LoRACompatibleConv or LoRACompatibleLinear module.") ValueError: Module down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q is not a LoRACompatibleConv or LoRACompatibleLinear module.
I met this problem too...
please try out the newest train-free, 10s inference version facechain-fact.
环境:aliyun PAI-DSW, modelscope:1.10.0-pytorch2.1.0tensorlow2.14.0-gpu-py310 按照步骤全部成功,只有部分部兼容报错,但上传图片后,点击训练,模型下载均没有问题。 然后显示ERROR,具体后台日志如下:
Traceback (most recent call last): File "/mnt/workspace/facechain/facechain/train_text_to_image_lora.py", line 1224, in
main()
File "/mnt/workspace/facechain/facechain/train_text_to_image_lora.py", line 1036, in main
accelerator.backward(loss)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1989, in backward
loss.backward(kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Steps: 0%| | 0/200 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 994, in launch_command
simple_launcher(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 636, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '/mnt/workspace/facechain/facechain/train_text_to_image_lora.py', '--pretrained_model_name_or_path=ly261666/cv_portrait_model', '--revision=v2.0', '--sub_path=film/film', '--output_dataset_name=/mnt/workspace/facechain/worker_data/qw/training_data/ly261666/cv_portrait_model/person1', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--num_train_epochs=200', '--checkpointing_steps=5000', '--learning_rate=1.5e-04', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--seed=42', '--output_dir=/mnt/workspace/facechain/worker_data/qw/ly261666/cv_portrait_model/person1', '--lora_r=4', '--lora_alpha=32', '--lora_text_encoder_r=32', '--lora_text_encoder_alpha=32', '--resume_from_checkpoint=fromfacecommon']' returned non-zero exit status 1.
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/gradio/queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "/opt/conda/lib/python3.10/site-packages/gradio/route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1550, in process_api
result = await self.call_function(
File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, args)
File "/opt/conda/lib/python3.10/site-packages/gradio/utils.py", line 661, in wrapper
response = f(args, kwargs)
File "/mnt/workspace/facechain/app.py", line 804, in run
train_lora_fn(base_model_path=base_model_path,
File "/mnt/workspace/facechain/app.py", line 207, in train_lora_fn
raise gr.Error("训练失败 (Training failed)")
gradio.exceptions.Error: '训练失败 (Training failed)'
请各位帮忙看看