vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.79k stars 4.1k forks source link

[Feature]: Support for Seq classification/Reward models #8700

Open ariaattar opened 1 week ago

ariaattar commented 1 week ago

🚀 The feature, motivation and pitch

Verifier/reward models are going to be very important moving forward for building:

Could we add support for sequence classification models like Skywork/Skywork-Reward-Llama-3.1-8B

Alternatives

No response

Additional context

No response

Before submitting a new issue...

youkaichao commented 1 week ago

contribution is welcome!

vllm already supports embedding model. I think it is quite similar to reward model. I don't know what would be the obstacle to use vllm code to run reward models. We can pretend they are embedding models.

ariaattar commented 1 week ago

Seems like a lot of the reward models use a different architecture than embedding models.

ValueError: Model architectures ['Gemma2ForSequenceClassification'] are not supported for now.
ValueError: Model architectures ['LlamaForSequenceClassification'] are not supported for now. 

 Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'Qwen2VLForConditionalGeneration', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MedusaModel', 'EAGLEModel', 'MLPSpeculatorPreTrainedModel', 'JambaForCausalLM', 'GraniteForCausalLM', 'MistralModel', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'MiniCPMV', 'PaliGemmaForConditionalGeneration', 'Phi3VForCausalLM', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'UltravoxModel', 'BartModel', 'BartForConditionalGeneration']

Here are two examples: Skywork/Skywork-Reward-Gemma-2-27B Ray2333/GRM-Llama3-8B-rewardmodel-ft

ariaattar commented 1 week ago

@youkaichao Added #8740 tried to convert it to the vllm format, but running into some tensor shape issues in compute logits. Let me know if the conversion generally looks right.

youkaichao commented 3 days ago

looks like https://github.com/vllm-project/vllm/pull/8896 already implements it.

natolambert commented 3 days ago

Hey - I've been working with reward models substantially in the open ecosystem building rewardbench, in reality most of the open models have subtly different architectures.

The easiest is *ForSequenceClassification, but it is much messier from here. Happy to take questions -- I'm going to look at the reward model RFC now too. Curious to follow these implementations.