Closed James-Dao closed 4 days ago
Could you provide a details run scripts and make sure you install with the latest setup.py?
version is 0.3.2.
script is:
URL="http://localhost:6000/generate"
CONCURRENCY_LEVEL=2
JSON_DATA='{ "prompt": "A lovely rabbit", "num_inference_steps": 28, "save_disk_path": "/tmp" }'
for ((i=1; i<=CONCURRENCY_LEVEL; i++)); do time curl -X POST "$URL" \ -H "Content-Type: application/json" \ -d "$JSON_DATA" & done
wait
echo "所有并发请求已完成。"
config.json :
config.json: | { "nproc_per_node": 1, "model": "/tmp/FLUX.1-schnell", "pipefusion_parallel_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "height": 512, "width": 512, "save_disk_path": "/cfs/dit/output", "use_cfg_parallel": false }
xdit run on one 80gb gpu card.
Did you send the service with 2 requests simultaneously? The HTTP server example is very simple for the demo. It would be best if you had a queue for concurrency in my opion.
So my understanding is, currently it doesn't support something like vLLM, where concurrency is naturally supported just by calling the interface?
vLLM implemented batching for service. xDiT currently does not implement it in HTTP server, because we found most of people use it in ComfyUI, which has a built in queue. Could you please provide some information on how do you want to use xDiT? We will consider to implement a batch scheduler if it is really in demand.
We produce images through conversation.
We have fix the concurrent access error in #359
Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1473, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 882, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(view_args) # type: ignore[no-any-return] File "/app/./comfyui-xdit/host.py", line 195, in generate_image output, elapsed_time = generate_image_parallel(params) File "/app/./comfyui-xdit/host.py", line 110, in generate_image_parallel output = pipe( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xfuser/model_executor/pipelines/base_pipeline.py", line 181, in wrapper return func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xfuser/model_executor/pipelines/base_pipeline.py", line 133, in data_parallel_fn return func(self, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xfuser/model_executor/pipelines/base_pipeline.py", line 147, in check_naive_forward_fn return self.module(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/flux/pipeline_flux.py", line 635, in call ) = self.encode_prompt( File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/flux/pipeline_flux.py", line 349, in encode_prompt prompt_embeds = self._get_t5_prompt_embeds( File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/flux/pipeline_flux.py", line 225, in _get_t5_prompt_embeds untruncated_ids = self.tokenizer_2(prompt, padding="longest", return_tensors="pt").input_ids File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3142, in _call_one return self.batch_encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3338, in batch_encode_plus return self._batch_encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 517, in _batch_encode_plus self.set_truncation_and_padding( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 452, in set_truncation_and_padding self._tokenizer.no_truncation() RuntimeError: Already borrowed [Rank 0] 2024-10-24 04:30:03 - INFO - 127.0.0.1 - - [24/Oct/2024 04:30:03] "POST /generate HTTP/1.1" 500 - 100%|██████████| 28/28 [00:02<00:00, 11.24it/s] [Rank 0] 2024-10-24 04:30:05 - INFO - Image generation completed in 2.66 seconds [Rank 0] 2024-10-24 04:30:05 - INFO - Image saved to: /tmp/generated_image_20241024-043005.png [Rank 0] 2024-10-24 04:30:05 - INFO - 127.0.0.1 - - [24/Oct/2024 04:30:05] "POST /generate HTTP/1.1" 200 -
Originally posted by @James-Dao in https://github.com/xdit-project/xDiT/issues/315#issuecomment-2434252708