homebrewltd / llama3-s

Llama3.1 learns to Listen
126 stars 4 forks source link

Chore: Multi-voice audio generation #12

Open hahuyhoang411 opened 1 month ago

hahuyhoang411 commented 1 month ago

Motivation:

Goal:

  1. Expand the model with multiple speakers (genders, regions, innotation, etc.)
  2. Create v2 dataset: 2B tokens?

Side idea:

  1. Overlap audio samples: e.g.
    • Speaker 1: 100 samples
    • Speaker 2: 100 samples (20 samples overlap with speaker 1)
hungphongtrn commented 1 month ago

I created a dataset with unique speaker as a reference for WhisperSpeech here: unique_speaker_audio

Basically, I sample data from openslr/librispeech_asr. For each unique speaker ID, I randomly get 1 sample.

We have 921 unique speakers. You guys can submit your voice to :))