issues
search
tenstorrent
/
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
5
stars
1
forks
source link
Add top-k top-p sampling and clean up input preparation
#11
Closed
skhorasganiTT
closed
2 months ago
skhorasganiTT
commented
2 months ago
Move any extra input preparation from execute_model to prepare_model_inputs
Removed padded logits in batch before sampling
Add top-k top-p sampling option and extra verifications for sampling parameters