[Feature]: Support for Seq classification/Reward models

ariaattar commented 1 week ago

🚀 The feature, motivation and pitch

Verifier/reward models are going to be very important moving forward for building:

High quality synthetic data pipelines
Verifying model reasoning
Multi agent systems

Could we add support for sequence classification models like Skywork/Skywork-Reward-Llama-3.1-8B

Alternatives

No response

Additional context

No response

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

youkaichao commented 1 week ago

contribution is welcome!

vllm already supports embedding model. I think it is quite similar to reward model. I don't know what would be the obstacle to use vllm code to run reward models. We can pretend they are embedding models.

ariaattar commented 1 week ago

Seems like a lot of the reward models use a different architecture than embedding models.

ValueError: Model architectures ['Gemma2ForSequenceClassification'] are not supported for now.
ValueError: Model architectures ['LlamaForSequenceClassification'] are not supported for now. 

 Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'Qwen2VLForConditionalGeneration', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MedusaModel', 'EAGLEModel', 'MLPSpeculatorPreTrainedModel', 'JambaForCausalLM', 'GraniteForCausalLM', 'MistralModel', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'MiniCPMV', 'PaliGemmaForConditionalGeneration', 'Phi3VForCausalLM', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'UltravoxModel', 'BartModel', 'BartForConditionalGeneration']

Here are two examples: Skywork/Skywork-Reward-Gemma-2-27B Ray2333/GRM-Llama3-8B-rewardmodel-ft

ariaattar commented 1 week ago

@youkaichao Added #8740 tried to convert it to the vllm format, but running into some tensor shape issues in compute logits. Let me know if the conversion generally looks right.

youkaichao commented 3 days ago

looks like https://github.com/vllm-project/vllm/pull/8896 already implements it.

natolambert commented 3 days ago

Hey - I've been working with reward models substantially in the open ecosystem building rewardbench, in reality most of the open models have subtly different architectures.

The easiest is *ForSequenceClassification, but it is much messier from here. Happy to take questions -- I'm going to look at the reward model RFC now too. Curious to follow these implementations.

vllm-project / vllm