xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Apache License 2.0
726 stars 56 forks source link

if i test it with 2 concurecy , it will run into error. error detail is : #325

Closed James-Dao closed 4 days ago

James-Dao commented 3 weeks ago
          if i test it with 2 concurecy , it will run into error. error detail is :

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1473, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 882, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(view_args) # type: ignore[no-any-return] File "/app/./comfyui-xdit/host.py", line 195, in generate_image output, elapsed_time = generate_image_parallel(params) File "/app/./comfyui-xdit/host.py", line 110, in generate_image_parallel output = pipe( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xfuser/model_executor/pipelines/base_pipeline.py", line 181, in wrapper return func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xfuser/model_executor/pipelines/base_pipeline.py", line 133, in data_parallel_fn return func(self, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xfuser/model_executor/pipelines/base_pipeline.py", line 147, in check_naive_forward_fn return self.module(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/flux/pipeline_flux.py", line 635, in call ) = self.encode_prompt( File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/flux/pipeline_flux.py", line 349, in encode_prompt prompt_embeds = self._get_t5_prompt_embeds( File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/flux/pipeline_flux.py", line 225, in _get_t5_prompt_embeds untruncated_ids = self.tokenizer_2(prompt, padding="longest", return_tensors="pt").input_ids File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3142, in _call_one return self.batch_encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3338, in batch_encode_plus return self._batch_encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 517, in _batch_encode_plus self.set_truncation_and_padding( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 452, in set_truncation_and_padding self._tokenizer.no_truncation() RuntimeError: Already borrowed [Rank 0] 2024-10-24 04:30:03 - INFO - 127.0.0.1 - - [24/Oct/2024 04:30:03] "POST /generate HTTP/1.1" 500 - 100%|██████████| 28/28 [00:02<00:00, 11.24it/s] [Rank 0] 2024-10-24 04:30:05 - INFO - Image generation completed in 2.66 seconds [Rank 0] 2024-10-24 04:30:05 - INFO - Image saved to: /tmp/generated_image_20241024-043005.png [Rank 0] 2024-10-24 04:30:05 - INFO - 127.0.0.1 - - [24/Oct/2024 04:30:05] "POST /generate HTTP/1.1" 200 -

Originally posted by @James-Dao in https://github.com/xdit-project/xDiT/issues/315#issuecomment-2434252708

feifeibear commented 3 weeks ago

Could you provide a details run scripts and make sure you install with the latest setup.py?

James-Dao commented 3 weeks ago

version is 0.3.2.

script is:

!/bin/bash

目标 URL

URL="http://localhost:6000/generate"

并发数量

CONCURRENCY_LEVEL=2

JSON 数据

JSON_DATA='{ "prompt": "A lovely rabbit", "num_inference_steps": 28, "save_disk_path": "/tmp" }'

发起并发请求

for ((i=1; i<=CONCURRENCY_LEVEL; i++)); do time curl -X POST "$URL" \ -H "Content-Type: application/json" \ -d "$JSON_DATA" & done

等待所有后台任务完成

wait

echo "所有并发请求已完成。"

config.json :

config.json: | { "nproc_per_node": 1, "model": "/tmp/FLUX.1-schnell", "pipefusion_parallel_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "height": 512, "width": 512, "save_disk_path": "/cfs/dit/output", "use_cfg_parallel": false }

James-Dao commented 3 weeks ago

xdit run on one 80gb gpu card.

feifeibear commented 2 weeks ago

Did you send the service with 2 requests simultaneously? The HTTP server example is very simple for the demo. It would be best if you had a queue for concurrency in my opion.

James-Dao commented 2 weeks ago

So my understanding is, currently it doesn't support something like vLLM, where concurrency is naturally supported just by calling the interface?

feifeibear commented 2 weeks ago

vLLM implemented batching for service. xDiT currently does not implement it in HTTP server, because we found most of people use it in ComfyUI, which has a built in queue. Could you please provide some information on how do you want to use xDiT? We will consider to implement a batch scheduler if it is really in demand.

James-Dao commented 2 weeks ago

We produce images through conversation.

feifeibear commented 4 days ago

We have fix the concurrent access error in #359