vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.67k stars 4.65k forks source link

[Feature]: asymmetric tensor parallel #5541

Open leiwen83 opened 5 months ago

leiwen83 commented 5 months ago

🚀 The feature, motivation and pitch

Current vllm don't support tp if vocab_size cannot be fully divided by tp number.

it would raise error as:

ERROR 06-14 21:04:12 worker_base.py:165]   File "/usr/local/lib/python3.10/dist-packages/vllm/distributed/utils.py", line 29, in divide
ERROR 06-14 21:04:12 worker_base.py:165]     ensure_divisibility(numerator, denominator)
ERROR 06-14 21:04:12 worker_base.py:165]   File "/usr/local/lib/python3.10/dist-packages/vllm/distributed/utils.py", line 22, in ensure_divisibility
ERROR 06-14 21:04:12 worker_base.py:165]     assert numerator % denominator == 0, "{} is not divisible by {}".format(

But for those model may only 5 card's gpu memory could hold, still need 8 for this reason, which is some kind of waste.

Alternatives

Could we support this asymmetric tp case? So that more gpu card could be saved?

Additional context

No response

github-actions[bot] commented 4 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!