Closed infinitylogesh closed 2 months ago
Thank you for adding support for Medusa. In my comparison of Medusa models versus the original base models with TGI, the latter appeared to be quicker.
I tested the below models:
Command used :
docker run --gpus all --shm-size 1g -p 8081:80 ghcr.io/huggingface/text-generation-inference:1.4.3 --model-id text-generation-inference/Mistral-7B-Instruct-v0.2-medusa --num-shard 1
Hardware:
1xH100
Medusa models should be faster than the original non-medusa models
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
System Info
Thank you for adding support for Medusa. In my comparison of Medusa models versus the original base models with TGI, the latter appeared to be quicker.
I tested the below models:
Information
Tasks
Reproduction
Command used :
Hardware:
1xH100
Expected behavior
Medusa models should be faster than the original non-medusa models