homebrewltd / llama3-s

Llama3.1 learns to Listen
145 stars 5 forks source link

data: Integrate existing open source Voice datasets #65

Open bachvudinh opened 1 day ago

bachvudinh commented 1 day ago

Found a very interesting and high quality data that used to train GPT-Omni. We can recreate the answer to be something like "Hello, iam Jade from Homebrew ..." Link Data: https://huggingface.co/datasets/gpt-omni/VoiceAssistant-400K. Along with this, the assistant also need function calling ability so i suggest these function calling data:

dan-homebrew commented 1 day ago

@bachvudinh I've renamed the Issue to be more specific to datasets