Add top-k top-p sampling and clean up input preparation - Githubissues

tenstorrent / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

5 stars 1 forks source link

Add top-k top-p sampling and clean up input preparation #11

Closed skhorasganiTT closed 2 months ago

skhorasganiTT commented 2 months ago

Move any extra input preparation from execute_model to prepare_model_inputs
Removed padded logits in batch before sampling
Add top-k top-p sampling option and extra verifications for sampling parameters