xunfeng2zkj commented 1 year ago

kamalkraj commented 1 year ago

@xunfeng2zkj Which branch and diffusers version inside container?

xunfeng2zkj commented 1 year ago

v2 branch, try pip package with (diffusers==0.11.1,transformers==4.25.1) and default in nvcr.io/nvidia/tritonserver:22.06-py3

xunfeng2zkj commented 1 year ago

this is convert onnx logs: etching 15 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 12345.87it/s] /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:754: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. mask.fill_(torch.tensor(torch.finfo(dtype).min)) /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:280: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len): /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:288: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len): /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:320: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim): /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/torch/onnx/symbolic_opset9.py:5408: UserWarning: Exporting aten::index operator of advanced indexing in opset 16 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results. warnings.warn( /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:369: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if any(s % default_overall_up_factor != 0 for s in sample.shape[-2:]): /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/diffusers/models/resnet.py:182: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/diffusers/models/resnet.py:187: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/diffusers/models/resnet.py:109: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/diffusers/models/resnet.py:122: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if hidden_states.shape[0] >= 64: /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:470: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py:258: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version) /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/torch/onnx/utils.py:687: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) _C._jit_pass_onnx_graph_shape_type_inference( /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/torch/onnx/utils.py:1178: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) _C._jit_pass_onnx_graph_shape_type_inference( /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/diffusers/models/vae.py:570: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py:258: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.) _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version) /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/torch/onnx/utils.py:687: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.) _C._jit_pass_onnx_graph_shape_type_inference( /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/torch/onnx/utils.py:1178: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.) _C._jit_pass_onnx_graph_shape_type_inference( /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/diffusers/models/vae.py:607: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: /home/lzcs/Documents/code/tools/stable-diffusion-tritonserver/venv/lib/python3.10/site-packages/torch/onnx/utils.py:617: UserWarning: ONNX Preprocess - Removing mutation from node aten::index_put_ on block input: '1'. This changes graph semantics. (Triggered internally at ../torch/csrc/jit/passes/onnx/remove_inplace_ops_for_onnx.cpp:335.) _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module) ONNX pipeline saved to stable-diffusion-onnx ONNX pipeline is loadable

xunfeng2zkj commented 1 year ago

============================= == Triton Inference Server ==

NVIDIA Release 22.06 (build 39726160) Triton Server Version 2.23.0

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I0119 06:05:07.785478 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fbebe000000' with size 268435456 I0119 06:05:07.785635 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864 I0119 06:05:07.786661 1 model_repository_manager.cc:1191] loading: text_encoder:1 I0119 06:05:07.887112 1 model_repository_manager.cc:1191] loading: stable_diffusion:1 I0119 06:05:07.902154 1 onnxruntime.cc:2466] TRITONBACKEND_Initialize: onnxruntime I0119 06:05:07.902221 1 onnxruntime.cc:2476] Triton TRITONBACKEND API version: 1.10 I0119 06:05:07.902249 1 onnxruntime.cc:2482] 'onnxruntime' TRITONBACKEND API version: 1.10 I0119 06:05:07.902264 1 onnxruntime.cc:2512] backend configuration: {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} I0119 06:05:07.921227 1 onnxruntime.cc:2568] TRITONBACKEND_ModelInitialize: text_encoder (version 1) I0119 06:05:07.921968 1 onnxruntime.cc:2611] TRITONBACKEND_ModelInstanceInitialize: text_encoder (GPU device 0) I0119 06:05:07.987297 1 model_repository_manager.cc:1191] loading: vae_decoder:1 I0119 06:05:08.087460 1 model_repository_manager.cc:1191] loading: unet:1 I0119 06:05:08.907431 1 model_repository_manager.cc:1345] successfully loaded 'text_encoder' version 1 I0119 06:05:08.917194 1 onnxruntime.cc:2568] TRITONBACKEND_ModelInitialize: vae_decoder (version 1) I0119 06:05:08.917444 1 onnxruntime.cc:2568] TRITONBACKEND_ModelInitialize: unet (version 1) I0119 06:05:08.917712 1 onnxruntime.cc:2611] TRITONBACKEND_ModelInstanceInitialize: unet (GPU device 0) I0119 06:05:11.131345 1 onnxruntime.cc:2611] TRITONBACKEND_ModelInstanceInitialize: vae_decoder (GPU device 0) I0119 06:05:11.131984 1 model_repository_manager.cc:1345] successfully loaded 'unet' version 1 I0119 06:05:11.252130 1 python_be.cc:1774] TRITONBACKEND_ModelInstanceInitialize: stable_diffusion (GPU device 0) I0119 06:05:11.252149 1 model_repository_manager.cc:1345] successfully loaded 'vae_decoder' version 1 0119 06:05:12.212904 116 pb_stub.cc:309] Failed to initialize Python stub: HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/models/stable_diffusion/1/stable_diffusion/1/tokenizer/'. Use repo_type argument if needed.

At: /usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py(166): validate_repo_id /usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py(114): _inner_fn /usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py(466): cached_file /usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py(1760): from_pretrained /models/stable_diffusion/1/model.py(70): initialize

E0119 06:05:12.355376 1 model_repository_manager.cc:1348] failed to load 'stable_diffusion' version 1: Internal: HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/models/stable_diffusion/1/stable_diffusion/1/tokenizer/'. Use repo_type argument if needed.

At: /usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py(166): validate_repo_id /usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py(114): _inner_fn /usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py(466): cached_file /usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py(1760): from_pretrained /models/stable_diffusion/1/model.py(70): initialize

I0119 06:05:12.355465 1 server.cc:556] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+

I0119 06:05:12.355497 1 server.cc:583] +-------------+-----------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Backend | Path | Config | +-------------+-----------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ | onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} | | python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} | +-------------+-----------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0119 06:05:12.355545 1 server.cc:626] +------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+	Model	Version
		At:
		/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py(166): validate_repo_id
		/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py(114): _inner_fn
		/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py(466): cached_file
		/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py(1760): from_pretrained
		/models/stable_diffusion/1/model.py(70): initialize
text_encoder	1	READY
unet	1	READY
vae_decoder	1	READY

+------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0119 06:05:12.380740 1 metrics.cc:650] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3060 I0119 06:05:12.380895 1 tritonserver.cc:2159] +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.23.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace | | model_repository_path[0] | /models/ | | model_control_mode | MODE_NONE | | strict_model_config | 1 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | response_cache_byte_size | 0 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0119 06:05:12.380914 1 server.cc:257] Waiting for in-flight requests to complete. I0119 06:05:12.380921 1 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences I0119 06:05:12.380925 1 model_repository_manager.cc:1223] unloading: unet:1 I0119 06:05:12.380945 1 model_repository_manager.cc:1223] unloading: text_encoder:1 I0119 06:05:12.380966 1 model_repository_manager.cc:1223] unloading: vae_decoder:1 I0119 06:05:12.380995 1 server.cc:288] All models are stopped, unloading models I0119 06:05:12.380999 1 server.cc:295] Timeout 30: Found 3 live models and 0 in-flight non-inference requests I0119 06:05:12.381028 1 onnxruntime.cc:2645] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0119 06:05:12.381028 1 onnxruntime.cc:2645] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0119 06:05:12.381068 1 onnxruntime.cc:2645] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0119 06:05:12.393729 1 onnxruntime.cc:2591] TRITONBACKEND_ModelFinalize: delete model state I0119 06:05:12.393754 1 model_repository_manager.cc:1328] successfully unloaded 'vae_decoder' version 1 I0119 06:05:12.406201 1 onnxruntime.cc:2591] TRITONBACKEND_ModelFinalize: delete model state I0119 06:05:12.406409 1 model_repository_manager.cc:1328] successfully unloaded 'text_encoder' version 1 I0119 06:05:12.433237 1 onnxruntime.cc:2591] TRITONBACKEND_ModelFinalize: delete model state I0119 06:05:12.433264 1 model_repository_manager.cc:1328] successfully unloaded 'unet' version 1 I0119 06:05:13.381187 1 server.cc:295] Timeout 29: Found 0 live models and 0 in-flight non-inference requests error: creating server: Internal - failed to load all models

kamalkraj commented 1 year ago

@xunfeng2zkj Thanks for sharing the info, could you please also share the model name to reproduce the issue

xunfeng2zkj commented 1 year ago

@kamalkraj thank you for answering me so quickly all pipeline only follow v2 readme.

xunfeng2zkj commented 1 year ago

@kamalkraj thank you for answering me so quickly all pipeline only follow v2 readme.

runwayml/stable-diffusion-v1-5

kamalkraj commented 1 year ago

@xunfeng2zkj Unable to reproduce the issue. Please clone the repo again and try

xunfeng2zkj commented 1 year ago

thx, Is it related to permission verification? "Internal: HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/models/stable_diffusion/1/stable_diffusion/1/tokenizer/'. Use repo_type argument if needed"

kamalkraj commented 1 year ago

share screenshot of tree view of models folder

xunfeng2zkj commented 1 year ago

tree.txt

xunfeng2zkj commented 1 year ago

├── convert_stable_diffusion_checkpoint_to_onnx.py ├── copy_files.sh ├── Dockerfile ├── Inference.ipynb ├── LICENSE ├── models │ ├── stable_diffusion │ │ ├── 1 │ │ │ ├── model.py │ │ │ ├── pycache │ │ │ │ └── model.cpython-38.pyc │ │ │ ├── scheduler │ │ │ │ └── scheduler_config.json │ │ │ └── tokenizer │ │ │ ├── merges.txt │ │ │ ├── special_tokens_map.json │ │ │ ├── tokenizer_config.json │ │ │ └── vocab.json │ │ └── config.pbtxt │ ├── text_encoder │ │ ├── 1 │ │ │ └── model.onnx │ │ └── config.pbtxt │ ├── unet │ │ ├── 1 │ │ │ └── model.onnx │ │ └── config.pbtxt │ └── vae_decoder │ ├── 1 │ │ └── model.onnx │ └── config.pbtxt ├── README.md ├── requirements.txt ├── stable-diffusion-onnx │ ├── feature_extractor │ │ └── preprocessor_config.json │ ├── model_index.json │ ├── safety_checker │ │ └── model.onnx │ ├── scheduler │ │ └── scheduler_config.json │ ├── text_encoder │ │ └── model.onnx │ ├── tokenizer │ │ ├── merges.txt │ │ ├── special_tokens_map.json │ │ ├── tokenizer_config.json │ │ └── vocab.json │ ├── unet │ │ └── model.onnx │ ├── vae_decoder │ │ └── model.onnx │ └── vae_encoder │ └── model.onnx └── tree.txt

xunfeng2zkj commented 1 year ago

OK, I solve it, just self.scheduler_config_path = current_name + "/scheduler/", but new problem :Stub process is unhealthy and it will be restarted.

SingL3 commented 1 year ago

@xunfeng2zkj Hi, did you solve the error of Stub process is unhealthy and it will be restarted.? How do you deal with that?

kamalkraj / stable-diffusion-tritonserver

UNAVAILABLE: Internal: HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/models/stable_diffusion/1/stable_diffusion/1/tokenizer/'. Use `repo_type` argument if needed. #6

============================= == Triton Inference Server ==