huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.36k stars 948 forks source link

Medusa models seem to be slower than the original base models #1641

Closed infinitylogesh closed 2 months ago

infinitylogesh commented 3 months ago

System Info

Thank you for adding support for Medusa. In my comparison of Medusa models versus the original base models with TGI, the latter appeared to be quicker.

I tested the below models:

Screenshot 2024-03-13 at 11 11 00 PM

Information

Tasks

Reproduction

Command used :

docker run --gpus all --shm-size 1g -p 8081:80 ghcr.io/huggingface/text-generation-inference:1.4.3 --model-id text-generation-inference/Mistral-7B-Instruct-v0.2-medusa --num-shard 1 

Hardware:

1xH100

Expected behavior

Medusa models should be faster than the original non-medusa models

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.