Closed CIntellifusion closed 4 days ago
Hi. Thanks for pointing it out. Yes, this should be consistent with our results. We modified and trained compression reward models multiple times with different architectures, accidentally causing inconsistencies in weights and this code. Please feel free to explore more suitable compression reward designs. As my experience, the performance of reward_model.pt
is not very well.
Thanks for you reply. I have reproduced your result on vc2 with aesthetic reward which seems fine.
------------------ Original ------------------ From: Zheyang Qin @.> Date: Thu,Oct 31,2024 2:59 PM To: mihirp1998/VADER @.> Cc: HaoyuWu556 @.>, Author @.> Subject: Re: [mihirp1998/VADER] Incompatitable of compression reward model ptand parameter (Issue #16)
The state dict of the
reward_model.pt
has following state dict. layers.0.weight torch.Size([512, 768]) layers.0.bias torch.Size([512]) layers.3.weight torch.Size([128, 512]) layers.3.bias torch.Size([128]) layers.6.weight torch.Size([32, 128]) layers.6.bias torch.Size([32]) layers.9.weight torch.Size([1, 32]) layers.9.bias torch.Size([1])But the code is
compression_scorer.py
define a 5-layer MLP. https://github.com/VideoVerses/VideoTuna/blob/c12a04ea5d0b4f5e69b41f960df8267911c41b61/src/lvdm/models/rlhf_utils/compression_scorer.py#L39I customed the model arch as follows:
Then I got a reward around [20,50]. Is this align with your results?
Or should I open a pr to solve this??
Thanks for your work.