uclaml / SPPO

The official implementation of Self-Play Preference Optimization (SPPO)
https://uclaml.github.io/SPPO/
Apache License 2.0
498 stars 62 forks source link

ShareGPT appending #4

Open Kquant03 opened 4 months ago

Kquant03 commented 4 months ago

How is the ShareGPT format handled with this workflow? I'm currently developing a dataset that could be greatly benefited from this technique. However, I hate training on "User" and "Assistant" tokens. It goes against my intentions when working with language models. With Axolotl, there's a way to change the header IDs for sharegpt datasets. I was wondering if there was something similar I could do here, or perhaps I could just do some data processing to change the format...