Incompatitable of compression reward model pt and parameter

CIntellifusion commented 5 days ago

The state dict of the reward_model.pt has following state dict. layers.0.weight torch.Size([512, 768]) layers.0.bias torch.Size([512]) layers.3.weight torch.Size([128, 512]) layers.3.bias torch.Size([128]) layers.6.weight torch.Size([32, 128]) layers.6.bias torch.Size([32]) layers.9.weight torch.Size([1, 32]) layers.9.bias torch.Size([1])

But the code is compression_scorer.py define a 5-layer MLP. https://github.com/VideoVerses/VideoTuna/blob/c12a04ea5d0b4f5e69b41f960df8267911c41b61/src/lvdm/models/rlhf_utils/compression_scorer.py#L39

I customed the model arch as follows:

Then I got a reward around [20,50]. Is this align with your results?

Or should I open a pr to solve this??

Thanks for your work.

QinOwen commented 4 days ago

Hi. Thanks for pointing it out. Yes, this should be consistent with our results. We modified and trained compression reward models multiple times with different architectures, accidentally causing inconsistencies in weights and this code. Please feel free to explore more suitable compression reward designs. As my experience, the performance of reward_model.pt is not very well.

CIntellifusion commented 4 days ago

Thanks for you reply. I have reproduced your result on vc2 with aesthetic reward which seems fine.

------------------ Original ------------------ From: Zheyang Qin @.> Date: Thu,Oct 31,2024 2:59 PM To: mihirp1998/VADER @.> Cc: HaoyuWu556 @.>, Author @.> Subject: Re: [mihirp1998/VADER] Incompatitable of compression reward model ptand parameter (Issue #16)

mihirp1998 / VADER

Incompatitable of compression reward model pt and parameter #16