Does it support Preference data (for training Reward / DPO)?

mosaicml / streaming

A Data Streaming Library for Efficient Neural Network Training

https://streaming.docs.mosaicml.com

Apache License 2.0

1.14k stars 145 forks source link

Does it support Preference data (for training Reward / DPO)? #656

Open ericxsun opened 7 months ago

ericxsun commented 7 months ago

🚀 Feature Request

The preference data looks like this:

{
    "chosen":
    [
        {"role": "user", "content": "abcd"},
        {"role": "assistant",  "content": "abcef"},
        ...
    ],
    "rejected":
    [
        {"role": "user", "content": "abcd"},
        {"role": "assistant", "content": "abcef"},
        ...
    ]
}

This data is used to train a Reward Model or DPO

I'm wondering if it's possible to use streaming for this kind of situation. And How? Thanks very much.

XiaohanZhangCMU commented 7 months ago

Yes it should work out of the box. Use MDSWrite to convert your preference data to MDS , and create a streaming dataset out of it. Use json as the encoding method. Let me know if you see any issue.

ericxsun commented 7 months ago

Yes it should work out of the box. Use MDSWrite to convert your preference data to MDS , and create a streaming dataset out of it. Use json as the encoding method. Let me know if you see any issue.

Thank you for quickly explain, I'll try it.

karan6181 commented 5 months ago

@ericxsun Wondering, have you tried the @XiaohanZhangCMU suggestion? Did it work?

XiaohanZhangCMU commented 2 months ago

Hi @ericxsun want to follow up here before closing this issue.