tlc4418 llm_optimization issues - Githubissues

tlc4418 / llm_optimization

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.

https://arxiv.org/abs/2310.02743

MIT License

25 stars 1 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

My BoN result is different from the result (Figure 3a) shown in the paper.

#13 yiwan-rl opened 3 weeks ago
0
Clarification: No Centering / Scaling / Standardizing of Ensembles' Rewards?

#12 RylanSchaeffer closed 1 month ago
2
Unknown split error when loading RM dataset `tlc4418/1.4b-policy_preference_data_gold_labelled`

#11 JohannesAck closed 3 weeks ago
1
Unable to Run PPO Training Using HuggingFace Path of SFT'd language model

#10 RylanSchaeffer opened 1 month ago
1
Best-of-n Pipeline takes ages - how to accelerate?

#9 RylanSchaeffer closed 3 weeks ago
1
When (not) to use Flash Attention?

#8 RylanSchaeffer closed 3 weeks ago
1
Best-of-n Pipeline: `ArrowInvalid: offset overflow while concatenating arrays`

#7 RylanSchaeffer closed 1 month ago
2
Why is reward model training not logged to W&B?

#6 RylanSchaeffer opened 1 month ago
1
Training Reward Model: AttributeError: 'Namespace' object has no attribute 'residual_dropout_lima'. Did you mean: 'residual_dropout'?

#5 RylanSchaeffer closed 1 month ago
3
How many GPUs (and VRAM) to use per stage?

#4 RylanSchaeffer closed 1 month ago
2
5 Installation Errors

#3 RylanSchaeffer closed 1 month ago
2
alpacafarm reward-model-human as gold reward

#2 georgao35 opened 2 months ago
4
How to re-implement the score-KL curve?

#1 zetian1025 closed 3 weeks ago
2