Open simonucl opened 6 days ago
The minimal workable code is done!
The development branch is in https://github.com/simonucl/vllm/tree/contrastive-decoding, with workable code under tests/contrast_decode/run.py. It's still WIP and any feedback will be appreciated! Also, feel free to request any functionality that fits your needs.
🚀 The feature, motivation and pitch
Contrastive Decoding (Li et al., 2022) is a decoding strategy that contrasts the log probabilities of two or more models at each token to shift the token distribution for better performance or less harmful outputs (Liu et al., 2021). Similar works are seen in Proxy-tuning (Liu et al., 2024), Emulator on aligned models (Mitchell et al., 2023), improving reasoning tasks (O'Brien et al., 2023) and Test-time alignment (Zhu et al., 2024). This approach also facilitates the recent interest in test-time alignment (Xu et al., 2024), where a token-level reward model is used to generate partial rewards at each token decoding stage to assist generation.
welcome for any contribution!
I am currently working on the implementation, and any contributions would be highly appreciated. The initial idea is similar to the speculative decoding method under
spec_decode/
, where two or more models are loaded into the GPU and perform inference at each timestep. More details will be shared soon!Reference
Alternatives
No response
Additional context
No response
Before submitting a new issue...