uclaml SPPO issues - Githubissues

uclaml / SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

https://uclaml.github.io/SPPO/

Apache License 2.0

477 stars 61 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Added a new PR to allow generation on fewer than 8 GPUs

#25 aman2304 opened 1 month ago
0
Modify generate.sh to take a dynamic number of GPUs as input

#24 aman2304 opened 1 month ago
1
Typo here

#23 xukp20 closed 1 month ago
1
DPO baseline implementation

#22 yesiam-png opened 2 months ago
0
SPPO Implementation on Axolotl!

#21 kaykyr opened 2 months ago
0
Flexible GPU specification in generate.sh

#19 xiaohangt closed 2 months ago
2
Adaptation for 4-bit Quantization Training/Responses Generation (with 2 Home GPUs)

#16 kaykyr closed 2 months ago
1
Scores and probability calcuations

#15 namdw opened 2 months ago
4
What's the package configuration for reproduce SPPO-Gemma-2?

#14 Jackory opened 2 months ago
1
Any chance it work on my homelab?

#13 kaykyr closed 2 months ago
3
Dataset used and results in Gemma-2-9B results

#12 hodachi-axcxept closed 2 months ago
13
Improve and Optimize DPO Trainer Code

#11 sanowl closed 2 months ago
0
Ranking speed & training hyperparameters

#10 skramer-dev opened 3 months ago
0
chore: update trainer.py

#9 eltociear closed 3 months ago
0
Is it normal the pipeline start with a huge loss ?

#8 qy1026 opened 3 months ago
3
Some packages' version are too old

#7 qy1026 opened 3 months ago
0
Questions about the training code

#6 blackblue9 closed 3 months ago
1
Which version of vllm should be installed

#5 xinghuang2050 opened 3 months ago
4
ShareGPT appending

#4 Kquant03 opened 3 months ago
0
Suggestion: Gemma 2 9B and 27B.

#3 kaykyr closed 2 months ago
2
ConnectionError: Couldn't reach 'synthetic_data_llama-3-8b-instruct-sppo-iter3_score' on the Hub (ConnectionError)

#2 xinghuang2050 opened 3 months ago
2
Is it possible to run llama 3-70B and/or mixtral 8x22b through this process?

#1 RandomInternetPreson opened 3 months ago
1