princeton-nlp / SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward
MIT License
626 stars 36 forks source link

Can you share the loss log of `Llama-3-8B-Instruct` #45

Closed AIR-hl closed 1 month ago

AIR-hl commented 1 month ago

I recipe the experiment using trl on smaller device, so I want to refer to the official training log to determine whether the execution is normal.

yumeng5 commented 1 month ago

Hi,

Please find the training curves here: https://wandb.ai/yumeng0818/simpo

Best, Yu

Xalp commented 1 month ago

Hi, is it possible that you also share the loss log for llama-3-8b-it v0.2 ?

yumeng5 commented 1 month ago

@Xalp The logs can be found here:

https://wandb.ai/yumeng0818/simpo/runs/zvv56fcj?nw=nwuseryumeng0818