modelscope / facechain

FaceChain is a deep-learning toolchain for generating your Digital-Twin.
Apache License 2.0
8.88k stars 833 forks source link

求助大佬,win系统,SD扩展训练报错, #422

Closed theussong closed 3 months ago

theussong commented 10 months ago

在上传图片后,开始训练,报如下错误

Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] Version: v1.6.0-2-g4afaaf8a Commit hash: 4afaaf8a020c1df457bcf7250cb1c7f609699fa7 --installing mmcv... Installing requirements for mmcv Couldn't install requirements for mmcv. Command: "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\python.exe" -m pip install mmcv-full==1.7.0 --prefer-binary Error code: 1 stdout: Looking in indexes: https://mirror.baidu.com/pypi/simple Looking in links: https://mirror.sjtu.edu.cn/pytorch-wheels/torch_stable.html Collecting mmcv-full==1.7.0 Using cached https://mirror.baidu.com/pypi/packages/a1/81/89120850923f4c8b49efba81af30160e7b1b305fdfa9671a661705a8abbf/mmcv-full-1.7.0.tar.gz (593 kB) Preparing metadata (setup.py): started Preparing metadata (setup.py): finished with status 'done'

stderr: ERROR: No .egg-info directory found in C:\Users\zhuga\AppData\Local\Temp\pip-pip-egg-info-a9871d0t

ERROR facechain: failed to install mmcv, make sure to have "CUDA Toolkit" and "Build Tools for Visual Studio" installed is_installed check for tensorflow-cpu failed as 'spec is None' Installing requirements for easyphoto-webui Installing requirements for tensorflow Installing requirements for easyphoto-webui Installing requirements for invisible-watermark Launching Web UI with arguments: --theme dark --port 7861 --xformers --api --autolaunch --share --server-name 10.0.50.33

========================= a1111-sd-webui-lycoris ========================= Starting from stable-diffusion-webui version 1.5.0 a1111-sd-webui-lycoris extension is no longer needed

All its features have been integrated into the native LoRA extension LyCORIS models can now be used as if there are regular LoRA models

This extension has been automatically deactivated Please remove this extension

Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu. 2023-11-10 16:18:01,202 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2023-11-10 16:18:01,205 - modelscope - INFO - TensorFlow version 2.14.0 Found. 2023-11-10 16:18:01,205 - modelscope - INFO - Loading ast index from C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4.cache\modelscope\hub\ast_indexer 2023-11-10 16:18:01,290 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 5fedc4aff6233fa2760da71fa82bf83e and a total number of 943 components indexed [AddNet] Updating model hashes... [AddNet] Updating model hashes... 2023-11-10 16:18:02,627 - ControlNet - INFO - ControlNet v1.1.416 ControlNet preprocessor location: C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\sd-webui-controlnet\annotator\downloads 2023-11-10 16:18:02,740 - ControlNet - INFO - ControlNet v1.1.416 sd-webui-prompt-all-in-one background API service started successfully. Loading weights [d7e2ac2f4a] from C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\models\Stable-diffusion\majicMIX realistic 麦橘写实_v2威力加强典藏版.safetensors Creating model from config: C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\configs\v1-inference.yaml [['C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain/resources/inpaint_template\1.jpg'], ['C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain/resources/inpaint_template\2.jpg'], ['C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain/resources/inpaint_template\3.jpg'], ['C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain/resources/inpaint_template\4.jpg'], ['C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain/resources/inpaint_template\5.jpg']] [] Running on local URL: http://10.0.50.33:7861 Applying attention optimization: xformers... done. Model loaded in 2.2s (load weights from disk: 0.5s, load config: 0.2s, create model: 0.2s, apply weights to model: 1.0s).

Could not create share link. Missing file: C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\site-packages\gradio\frpc_windows_amd64_v0.2.

Please check your internet connection. This can happen if your antivirus software blocks the download of this file. You can install manually by following these steps:

  1. Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_windows_amd64.exe
  2. Rename the downloaded file to: frpc_windows_amd64_v0.2
  3. Move the file to this location: C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\site-packages\gradio Startup time: 47.8s (prepare environment: 11.5s, import torch: 4.9s, import gradio: 0.8s, setup paths: 0.6s, initialize shared: 0.2s, other imports: 0.4s, load scripts: 3.8s, create ui: 1.7s, gradio launch: 23.3s, app_started_callback: 0.4s). 显存足够 --------uuid: qw ----------work_dir: C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\worker_data\qw\ly261666/cv_portrait_model\person1 2023-11-10 16:23:24,486 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2023-11-10 16:23:24,490 - modelscope - INFO - TensorFlow version 2.14.0 Found. 2023-11-10 16:23:24,490 - modelscope - INFO - Loading ast index from C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4.cache\modelscope\hub\ast_indexer 2023-11-10 16:23:24,615 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 5fedc4aff6233fa2760da71fa82bf83e and a total number of 943 components indexed 2023-11-10 16:23:26,405 - modelscope - INFO - Use user-specified model revision: v4.0 2023-11-10 16:34:11,228 - modelscope - WARNING - Download file from: 1342177280 to: 1509949439 failed, will retry 2023-11-10 16:35:43,082 - modelscope - WARNING - Download file from: 2348810240 to: 2516582399 failed, will retry 2023-11-10 16:37:39,128 - modelscope - WARNING - Download file from: 3187671040 to: 3355443199 failed, will retry 2023-11-10 16:38:44,592 - modelscope - ERROR - File C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4.cache\modelscope\hub\temp\tmpgqa97qz6\diffusion_pytorch_model.bin integrity check failed, the download may be incomplete, please try again. Process Process-1: Traceback (most recent call last): File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\multiprocessing\process.py", line 108, in run self._target(*self._args, self._kwargs) File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\facechain\inference.py", line 24, in _data_process_fn_process Blipv2()(input_img_dir) File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\facechain\data_process\preprocessing.py", line 204, in init self.model = DeepDanbooru() File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\facechain\data_process\deepbooru.py", line 721, in init snapshot_path = snapshot_download(foundation_model_id, revision='v4.0') File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\site-packages\modelscope\hub\snapshot_download.py", line 159, in snapshot_download file_integrity_validation(temp_file, model_file[FILE_HASH]) File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\site-packages\modelscope\hub\utils\utils.py", line 94, in file_integrity_validation raise FileIntegrityError(msg) modelscope.hub.errors.FileIntegrityError: File C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4.cache\modelscope\hub\temp\tmpgqa97qz6\diffusion_pytorch_model.bin integrity check failed, the download may be incomplete, please try again.** instance_data_dir C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\worker_data\qw\training_data\ly261666/cv_portrait_model\person1 'accelerate' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� Error executing the command: Command '['accelerate', 'launch', 'C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain/facechain/train_text_to_image_lora.py', '--pretrained_model_name_or_path=ly261666/cv_portrait_model', '--revision=v2.0', '--sub_path=film/film', '--output_dataset_name=C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\worker_data\qw\training_data\ly261666/cv_portrait_model\person1', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--num_train_epochs=200', '--checkpointing_steps=5000', '--learning_rate=1.5e-04', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--seed=42', '--output_dir=C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\worker_data\qw\ly261666/cv_portrait_model\person1', '--lora_r=4', '--lora_alpha=32', '--lora_text_encoder_r=32', '--lora_text_encoder_alpha=32', '--resume_from_checkpoint=fromfacecommon']' returned non-zero exit status 1. Traceback (most recent call last): File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\app.py", line 111, in train_lora_fn subprocess.run(command, check=True) File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['accelerate', 'launch', 'C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain/facechain/train_text_to_image_lora.py', '--pretrained_model_name_or_path=ly261666/cv_portrait_model', '--revision=v2.0', '--sub_path=film/film', '--output_dataset_name=C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\worker_data\qw\training_data\ly261666/cv_portrait_model\person1', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--num_train_epochs=200', '--checkpointing_steps=5000', '--learning_rate=1.5e-04', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--seed=42', '--output_dir=C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\worker_data\qw\ly261666/cv_portrait_model\person1', '--lora_r=4', '--lora_alpha=32', '--lora_text_encoder_r=32', '--lora_text_encoder_alpha=32', '--resume_from_checkpoint=fromfacecommon']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\site-packages\gradio\routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\site-packages\gradio\blocks.py", line 1431, in process_api result = await self.call_function( File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\site-packages\gradio\blocks.py", line 1103, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, args) File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\python\lib\site-packages\gradio\utils.py", line 707, in wrapper response = f(args, kwargs) File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\app.py", line 691, in run train_lora_fn(base_model_path=base_model_path, File "C:\Users\zhuga\stable-diffusion\sd-webui-aki-v4.4\extensions\facechain\app.py", line 114, in train_lora_fn raise gr.Error("训练失败 (Training failed)") gradio.exceptions.Error: '训练失败 (Training failed)'**

wenmengzhou commented 10 months ago
  1. ERROR facechain: failed to install mmcv, make sure to have "CUDA Toolkit" and "Build Tools for Visual Studio" installed 请安装对应工具编译mmcv
  2. 代码中accelerate launch改成python
ArnoQY commented 9 months ago

This is a classical bug under venv, the subprocess.run in app.py forget to pass os.environ, so the env in sub process turns to global. I don't know whether facechain will change PATH in train_text_to_image_lora.py, thought I still use subprocess.Popen interimly. This is not safe for main process.

For line 88 in app.py, windows command has been changed by me as: f'{%your-stable-diffusion-dir%/venv/Scripts/python.exe}', f'{project_dir}/facechain/train_text_to_image_lora.py',

For line 112 in app.py, 'try' code has been changed by me as: my_env = os.environ subprocess.Popen(command, env=my_env)

This worked for me. To be specific, here has dissussed this problem. Hope writer can handle this special scenario in stable-diffusion-webui later.

liuyhwangyh commented 8 months ago

Please install mmcv with:
min install mmcv-full==1.7.2 ref: https://mmcv.readthedocs.io/en/latest/get_started/installation.html

dongxiaoke commented 7 months ago

facechain 项目下 install.py 文件修改 #32 install mmcv-full==1.7.2

dongxiaoke commented 7 months ago

facechain 项目下 install.py 文件修改 32行 install mmcv-full==1.7.2

sunbaigui commented 3 months ago

please try out the newest train-free, 10s inference version facechain-fact.