Upgrade to VLLM 0.63 - Githubissues

mobiusml / aana_sdk

Aana SDK is a powerful framework for building AI enabled multimodal applications.

https://www.mobiuslabs.com/

Apache License 2.0

31 stars 3 forks source link

Upgrade to VLLM 0.63 #188

Closed appoose closed 45 minutes ago

appoose commented 3 hours ago

vLLM version 0.60 is claiming a major uplift in speed and performance https://blog.vllm.ai/2024/09/05/perf-update.html . This is consistent to my observation in a runs in a100 instances in vastai and it supports multimodal better ( i.e. no particular version of transformers for models like Qwen-VL ). So suggest we upgrade to vllm-0.63 ( the latest pypi release at the time of writing or more )

appoose commented 3 hours ago

@movchan74 thoughts?

appoose commented 2 hours ago

Also the async engine works really well since they seperated out the server and api engine.

For example, much easier to generate streaming behaviour

eg:

engine_args=AsyncEngineArgs(model="Qwen/Qwen2-VL-2B-Instruct",    
                  limit_mm_per_prompt={'image': 3, 'video': 3},
                  gpu_memory_utilization=0.9)
vllm_vl_engine = AsyncLLMEngine.from_engine_args(engine_args)  
min_pixels = 224*224
max_pixels = 1024*1024
vl_model_processor = Qwen2VLProcessor.from_pretrained(
    "Qwen/Qwen2-VL-7B-Instruct",
    min_pixels=min_pixels,
    max_pixels=max_pixels
)

outputs_generator = vllm_vl_engine.generate(
      prompt={"prompt": text, "multi_modal_data": mm_data},
      sampling_params=SamplingParams(max_tokens=1024, temperature=0.0),
      request_id=str(uuid.uuid4()),
  )

  already_generated = 0 
  async for output in outputs_generator:
      generated_so_far = already_generated 
      already_generated = len(output.outputs[0].text)
      yield output.outputs[0].text[generated_so_far:]`

movchan74 commented 48 minutes ago

We are already using 0.6. The main branch requires vllm>=0.6.1.post2 and poetry lock set to 0.6.2. I can update the poetry lock to 0.6.3 but we will not have any significant performance boost since we are already on 0.6.

Also, we have been using async API since the beginning.

appoose commented 46 minutes ago

Ok, then let us keep as it is. I will close the ticket

On Wed 23. Oct 2024 at 09:30, Aleksandr Movchan @.***> wrote:

We are already using 0.6. The main branch requires vllm>=0.6.1.post2 and poetry lock set to 0.6.2. I can update the poetry lock to 0.6.3 but we will not have any significant performance boost since we are already on 0.6.

Also, we have been using async API since the beginning.

— Reply to this email directly, view it on GitHub https://github.com/mobiusml/aana_sdk/issues/188#issuecomment-2431143053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJL5SLZMOEOVVUPK6KSMQTZ45F7ZAVCNFSM6AAAAABQN6CXYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZRGE2DGMBVGM . You are receiving this because you were assigned.Message ID: @.***>

appoose commented 42 minutes ago

I am also assuming the release still is below 0.6 ~ and will be upgraded in the next release.