Open ariaattar opened 1 week ago
contribution is welcome!
vllm already supports embedding model. I think it is quite similar to reward model. I don't know what would be the obstacle to use vllm code to run reward models. We can pretend they are embedding models.
Seems like a lot of the reward models use a different architecture than embedding models.
ValueError: Model architectures ['Gemma2ForSequenceClassification'] are not supported for now.
ValueError: Model architectures ['LlamaForSequenceClassification'] are not supported for now.
Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'Qwen2VLForConditionalGeneration', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MedusaModel', 'EAGLEModel', 'MLPSpeculatorPreTrainedModel', 'JambaForCausalLM', 'GraniteForCausalLM', 'MistralModel', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'MiniCPMV', 'PaliGemmaForConditionalGeneration', 'Phi3VForCausalLM', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'UltravoxModel', 'BartModel', 'BartForConditionalGeneration']
Here are two examples: Skywork/Skywork-Reward-Gemma-2-27B Ray2333/GRM-Llama3-8B-rewardmodel-ft
@youkaichao Added #8740 tried to convert it to the vllm format, but running into some tensor shape issues in compute logits. Let me know if the conversion generally looks right.
looks like https://github.com/vllm-project/vllm/pull/8896 already implements it.
Hey - I've been working with reward models substantially in the open ecosystem building rewardbench, in reality most of the open models have subtly different architectures.
The easiest is *ForSequenceClassification
, but it is much messier from here. Happy to take questions -- I'm going to look at the reward model RFC now too. Curious to follow these implementations.
🚀 The feature, motivation and pitch
Verifier/reward models are going to be very important moving forward for building:
Could we add support for sequence classification models like Skywork/Skywork-Reward-Llama-3.1-8B
Alternatives
No response
Additional context
No response
Before submitting a new issue...