lucidrains self-rewarding-lm-pytorch issues

lucidrains / self-rewarding-lm-pytorch

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

MIT License

1.32k stars 73 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

What's the reference model for DPO?

#31 Draconda closed 5 months ago
1
OSError: [Errno 22] Invalid argument: 'preference_seq.memmap.npy'

#30 Oloup opened 5 months ago
0
Fixed deep copy, shallow copy error and label mask error.

#29 Control-derek closed 5 months ago
1
Solves the problem that some variables are not declared

#28 Control-derek closed 6 months ago
1
Solves the problem that some variables are not declared

#27 Control-derek closed 6 months ago
1
add self.

#26 Control-derek closed 6 months ago
1
ModuleNotFoundError: No module named 'x_transformers'

#25 mayankpathaklumiq opened 7 months ago
1
UnboundLocalError: local variable 'self_reward_model' referenced before assignment

#24 UbeCc closed 2 months ago
3
What changes should I make to apply the method on Llama2?

#23 Labmem009 opened 7 months ago
0
I encountered the following error when trying to run usage

#21 Yanfors opened 7 months ago
1
Fix TypeError for is_valid_reward in SelfRewardDPOConfig

#19 ViswanathaReddyGajjala closed 7 months ago
1
TypeError: tuple indices must be integers or slices, not tuple

#18 fakerybakery opened 7 months ago
1
Update self_rewarding_lm_pytorch.py

#17 unaidedelf8777 closed 7 months ago
1
RuntimeError: Placeholder storage has not been allocated on MPS device!

#15 fakerybakery closed 8 months ago
2
Multiple GPUs

#14 fakerybakery closed 8 months ago
0
Update self_rewarding_lm_pytorch.py

#13 Dyke-F closed 8 months ago
1
Update spin.py

#12 Dyke-F closed 8 months ago
2
Why use a custom sample function instead of original HuggingFace generate() function?

#11 scarydemon2 closed 8 months ago
1
How to use HF Transformers model

#10 fakerybakery opened 8 months ago
3
Default `iteration` about SPIN. (Reward model~Policy model)

#9 KyujinHan closed 8 months ago
1
run spin demo

#8 westlongtime closed 8 months ago
3
The reward prompt is weak.

#7 Minami-su closed 8 months ago
6
Update README.md

#5 eltociear closed 8 months ago
1
Is this work in progress?

#4 jbdatascience closed 8 months ago
4
Help with Setting up and running ?

#3 badboysm890 closed 8 months ago
1
code and dataset？

#1 wanghao-007 closed 8 months ago
0