modelscope / data-juicer

Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Apache License 2.0
3.03k stars 181 forks source link

sharegpt format support #488

Open IvanDeng0 opened 2 weeks ago

IvanDeng0 commented 2 weeks ago

Before Asking 在提问之前

Search before asking 先搜索,再提问

Question

my dataset is formed with sharegpt format, just like: [ {"conversations": [ {"from": "user", "value": "..."}, {"from": "gpt", "value": "..."}, ] } ]

how to set the "text-keys"

Additional 额外信息

No response

HYLcool commented 4 days ago

Hi @IvanDeng0 , thanks for your suggestion!

As far as I know, the format of ShareGPT is the same as that of LLaVA. If so, we already support the llava-to-dj and dj-to-llava format conversion tools. You can find it in the LLaVA-like format section from this document and check if it can meet your needs. If not, contact us and discuss your needs with us here.

If there are needs on SFT dataset format, please follow up. @drcege