modelscope / facechain

FaceChain is a deep-learning toolchain for generating your Digital-Twin.
Apache License 2.0
8.86k stars 834 forks source link

person model training error #480

Closed nanaj96 closed 3 months ago

nanaj96 commented 8 months ago

显存足够 Setting base model to SD1.5 --------uuid: qw ----------work_dir: /content/facechain/worker_data/qw/ly261666/cv_portrait_model/person1 2023-12-23 13:51:02,739 - modelscope - INFO - Use user-specified model revision: v1.0.0 /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:65: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider' warnings.warn( 2023-12-23 13:51:07,248 - modelscope - INFO - PyTorch version 2.1.0+cu121 Found. 2023-12-23 13:51:07,251 - modelscope - INFO - TensorFlow version 2.15.0 Found. 2023-12-23 13:51:07,251 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2023-12-23 13:51:07,290 - modelscope - INFO - Loading done! Current index file version is 1.10.0, with md5 1f7ecbe335b689008f5303bd30793944 and a total number of 946 components indexed 2023-12-23 13:51:09.337756: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-12-23 13:51:09.337810: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-12-23 13:51:09.339627: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-12-23 13:51:10.646206: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /content/facechain/app.py:1276: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead. output_images = gr.Gallery(label='Output', show_label=False).style(columns=3, rows=2, height=600, [['/content/facechain/resources/inpaint_template/5.jpg'], ['/content/facechain/resources/inpaint_template/4.jpg'], ['/content/facechain/resources/inpaint_template/2.jpg'], ['/content/facechain/resources/inpaint_template/1.jpg'], ['/content/facechain/resources/inpaint_template/3.jpg']] /content/facechain/app.py:1379: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead. output_images = gr.Gallery( [['resources/tryon_garment/garment4.png'], ['resources/tryon_garment/garment1.png'], ['resources/tryon_garment/garment2.png'], ['resources/tryon_garment/garment3.png']] /content/facechain/app.py:1530: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead. output_images = gr.Gallery( 2023-12-23 13:51:15,736 - modelscope - INFO - Use user-specified model revision: v4.0 2023-12-23 13:51:18,822 - modelscope - INFO - Use user-specified model revision: v1.0.1 Downloading: 100%|████████████████████████████████████████████████████████████████| 121k/121k [00:00<00:00, 2.15MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████| 118/118 [00:00<00:00, 642kB/s] Downloading: 100%|████████████████████████████████████████████████████████████████| 146k/146k [00:00<00:00, 2.52MB/s] Downloading: 100%|████████████████████████████████████████████████████████████████| 217M/217M [00:02<00:00, 94.7MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████| 97.8M/97.8M [00:00<00:00, 111MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████| 12.4k/12.4k [00:00<00:00, 42.0MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████| 51.2M/51.2M [00:00<00:00, 96.8MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████| 4.90k/4.90k [00:00<00:00, 18.6MB/s] Downloading: 100%|█████████████████████████████████████████████████████████████████| 104M/104M [00:00<00:00, 117MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████| 76.4k/76.4k [00:00<00:00, 2.24MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████| 82.0k/82.0k [00:00<00:00, 2.31MB/s] Process Process-1: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/modelscope/utils/import_utils.py", line 450, in _get_module requires(module_name_full, requirements) File "/usr/local/lib/python3.10/dist-packages/modelscope/utils/import_utils.py", line 353, in requires raise ImportError(''.join(failed)) ImportError: modelscope.models.nlp.chatglm2.tokenization requires the SentencePiece library but it was not found in your environment. Checkout the instructions on the installation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones that match your environment.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/content/facechain/facechain/inference.py", line 25, in _data_process_fn_process Blipv2()(input_img_dir) File "/content/facechain/facechain/data_process/preprocessing.py", line 205, in init self.skin_retouching = pipeline('skin-retouching-torch', model='damo/cv_unet_skin_retouching_torch', model_revision='v1.0.1') File "/usr/local/lib/python3.10/dist-packages/modelscope/pipelines/builder.py", line 163, in pipeline clear_llm_info(kwargs) File "/usr/local/lib/python3.10/dist-packages/modelscope/pipelines/builder.py", line 227, in clear_llm_info from .nlp.llm_pipeline import ModelTypeHelper File "/usr/local/lib/python3.10/dist-packages/modelscope/pipelines/nlp/llm_pipeline.py", line 15, in from modelscope.models.nlp import ChatGLM2Tokenizer, Llama2Tokenizer File "", line 1075, in _handle_fromlist File "/usr/local/lib/python3.10/dist-packages/modelscope/utils/import_utils.py", line 435, in getattr value = getattr(module, name) File "/usr/local/lib/python3.10/dist-packages/modelscope/utils/import_utils.py", line 434, in getattr module = self._get_module(self._class_to_module[name]) File "/usr/local/lib/python3.10/dist-packages/modelscope/utils/import_utils.py", line 453, in _get_module raise RuntimeError( RuntimeError: Failed to import modelscope.models.nlp.chatglm2.tokenization because of the following error (look up to see its traceback):

modelscope.models.nlp.chatglm2.tokenization requires the SentencePiece library but it was not found in your environment. Checkout the instructions on the installation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones that match your environment.

instance_data_dir /content/facechain/worker_data/qw/training_data/ly261666/cv_portrait_model/person1 project dir: /content/facechain params: >base_model_path:ly261666/cv_portrait_model, >revision:v2.0, >sub_path:film/film, >output_img_dir:/content/facechain/worker_data/qw/training_data/ly261666/cv_portrait_model/person1, >work_dir:/content/facechain/worker_data/qw/ly261666/cv_portrait_model/person1, >lora_r:4, >lora_alpha:32 The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. 2023-12-23 13:51:42.913890: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-12-23 13:51:42.913941: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-12-23 13:51:42.915917: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-12-23 13:51:44.138992: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2023-12-23 13:51:44,753 - modelscope - INFO - PyTorch version 2.1.0+cu121 Found. 2023-12-23 13:51:44,755 - modelscope - INFO - TensorFlow version 2.15.0 Found. 2023-12-23 13:51:44,755 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2023-12-23 13:51:44,793 - modelscope - INFO - Loading done! Current index file version is 1.10.0, with md5 1f7ecbe335b689008f5303bd30793944 and a total number of 946 components indexed 12/23/2023 13:51:46 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

Mixed precision type: no

2023-12-23 13:51:47,726 - modelscope - INFO - Use user-specified model revision: v2.0 {'dynamic_thresholding_ratio', 'variance_type', 'clip_sample_range', 'sample_max_value', 'thresholding'} was not found in config. Values will be initialized to default values. {'force_upcast'} was not found in config. Values will be initialized to default values. {'reverse_transformer_layers_per_block', 'attention_type', 'dropout'} was not found in config. Values will be initialized to default values. Traceback (most recent call last): File "/content/facechain/facechain/train_text_to_image_lora.py", line 1224, in main() File "/content/facechain/facechain/train_text_to_image_lora.py", line 789, in main dataset = load_dataset("imagefolder", data_dir=args.dataset_name) File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2519, in load_dataset builder_instance = load_dataset_builder( File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2192, in load_dataset_builder dataset_module = dataset_module_factory( File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1735, in dataset_module_factory ).get_module() File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1119, in get_module patterns = sanitize_patterns(self.data_files) if self.data_files is not None else get_data_patterns(base_path) File "/usr/local/lib/python3.10/dist-packages/datasets/data_files.py", line 475, in get_data_patterns raise EmptyDatasetError(f"The directory at {base_path} doesn't contain any data files") from None datasets.data_files.EmptyDatasetError: The directory at /content/facechain/worker_data/qw/training_data/ly261666/cv_portrait_model/person1_labeled doesn't contain any data files Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1017, in launch_command simple_launcher(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/facechain/facechain/train_text_to_image_lora.py', '--pretrained_model_name_or_path=ly261666/cv_portrait_model', '--revision=v2.0', '--sub_path=film/film', '--output_dataset_name=/content/facechain/worker_data/qw/training_data/ly261666/cv_portrait_model/person1', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--num_train_epochs=200', '--checkpointing_steps=5000', '--learning_rate=1.5e-04', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--seed=42', '--output_dir=/content/facechain/worker_data/qw/ly261666/cv_portrait_model/person1', '--lora_r=4', '--lora_alpha=32', '--lora_text_encoder_r=32', '--lora_text_encoder_alpha=32', '--resume_from_checkpoint=fromfacecommon']' returned non-zero exit status 1. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 407, in call_prediction output = await route_utils.call_process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 226, in call_process_api output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1550, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1185, in call_function prediction = await anyio.to_thread.run_sync( File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, args) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 661, in wrapper response = f(args, **kwargs) File "/content/facechain/app.py", line 804, in run train_lora_fn(base_model_path=base_model_path, File "/content/facechain/app.py", line 207, in train_lora_fn raise gr.Error("训练失败 (Training failed)") gradio.exceptions.Error: '训练失败 (Training failed)'

nanaj96 commented 8 months ago

do you know how i can fix this error?

billxiang2012 commented 7 months ago

Has it been resolved? I have the same error.

nanaj96 commented 7 months ago

no, it is still the same error, what's happening? can you help me how can I fix this error?

ajiansoft commented 7 months ago

may be you can try:pip install sentencepiece

lvsh2012 commented 7 months ago

may be you can try:pip install sentencepiece

这个可以,解决了

sunbaigui commented 3 months ago

please try out the newest train-free, 10s inference version facechain-fact.