metavoiceio / metavoice-src

Foundational model for human-like, expressive TTS
https://themetavoice.xyz/
Apache License 2.0
3.48k stars 614 forks source link

[CONTRIBUTION] Speech Dataset Generator for Metavoice #112

Open davidmartinrius opened 3 months ago

davidmartinrius commented 3 months ago

Hi everyone!

I have just published this project on GitHub: https://github.com/davidmartinrius/speech-dataset-generator/

Now you can create datasets automatically with any audio or lists of audios.

This project creates metavoice datasets. You can pass your own files, youtube links, ted talks or librivox audiobooks as input and it will create a dataset from that.

I hope you can find it useful.

Here are the key functionalities of the project:

  1. Dataset Generation: The project allows for the creation of datasets with Mean Opinion Score (MOS).

  2. Silence Removal: It includes a feature to remove silences from audio files, enhancing the overall quality.

  3. Sound Quality Improvement: The project focuses on improving the quality of the audio.

  4. Audio Segmentation: It can segment audio files within specified second ranges.

  5. Transcription: The project transcribes the segmented audio, providing a textual representation.

  6. Gender Identification: It identifies the gender of each speaker in the audio.

  7. Pyannote Embeddings: Utilizes pyannote embeddings for speaker detection across multiple audio files.

  8. Automatic Speaker Naming: Automatically assigns names to speakers detected in multiple audios.

  9. Multiple Speaker Detection: Capable of detecting multiple speakers within each audio file.

Feel free to explore the project at https://github.com/davidmartinrius/speech-dataset-generator.

Actually I was not planning to include the Metavoice dataset in the speech dataset generator, but @platform-kit asked me to implement it and I just did it https://github.com/speechbrain/speechbrain/discussions/2428#discussioncomment-8895311

maepopi commented 3 months ago

Oh wow thank you for this!! I myself made an audio dataset manager a month ago but I think yours is much more complete! Here's mine if you want to take a look and maybe merge the two together : mine is mostly designed to work with this repo, and it notably has a feature to correct JSON transcriptions and manage your dataset from a UI.

Thank you again, can't wait to test your tool!

davidmartinrius commented 3 months ago

Hi @maepopi , pull requests are welcome :) I have little time to add new features, as I am developing it in my free time. I any case, new features like yours are welcome. Thanks for sharing your tools.

maepopi commented 3 months ago

Hey there! Oh that’s great! I’ve never contributed to another repo before so that will be a first 😂 I’ll start by having a look and see if I can add my stuff, and I’ll keep you posted 😊

davidmartinrius commented 3 months ago

Great! How are you holding up? What key points do you think could be integrated into the project?

maepopi commented 3 months ago

Hey! Sorry I didn't have time to have a look yet, I'll try some time this week or week end.

From what I've read in your readme, I think you integrated most of what I did in my tool like transcription and audio segmentation. I'm very curious to test your quality improvement feature, for I have a couple of audiobooks whose sound is really not great. In the end I think what I would add is my part about checking and fixing the transcription, but I might have to add an option to deal with CSV inputs instead of JSON.

Anyway only speculations here, as I said I didn't test your tool yet, I'll try and do that soon! Sorry