ibab / tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper
MIT License
5.41k stars 1.29k forks source link

Singing #32

Closed bhack closed 4 years ago

bhack commented 7 years ago

It could be really interesting to train with singing solo tracks and Lilypond. But probably could be too hard to collect a datest handing copyrights issues.

lemonzi commented 7 years ago

It'll probably be easier to find annotations in MIDI, there are not that many people transcribing melodies with Lilypond. The issue is usually to align the annotations to the audio; in speech recognition it's common to learn the alignment as well as the phoneme discrimination. Datasets from the MIREX challenges are probably a good starting point for the search.

El dg., 18 set. 2016 a les 6:45, bhack (notifications@github.com) va escriure:

It could be really interesting to train with singing solo tracks and Lilypond http://lilypond.org/doc/v2.19/Documentation/notation/common-notation-for-vocal-music. But probably could be too hard to collect a datest handing copyrights issues.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ibab/tensorflow-wavenet/issues/32, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCF5jYkdy2v-E7fWwvaTioyGYc_2230ks5qrRYNgaJpZM4J_3mN .

bhack commented 7 years ago

There is a Lilypond archive called Mutopia project but it is for classical music. Probably for singing could trained using GT from Mirex Voice Separation

sertansenturk commented 7 years ago

Another possible option is to contact audio-lyrics alignment people who are working in a capella singing voice.

I'm not familiar if there are any easy-to-reach datasets but @georgid might help you guys pinpointing some. In fact he has a dataset too (https://github.com/MTG/turkish-makam-acapella-sections-dataset), but I'm not sure if the size is adequate and the music style is too specific for your preferences...

basveeling commented 7 years ago

Take a look at MedleyDB, they have F0 annotations for individual stems for a couple hundred songs if I remember correctly: http://medleydb.weebly.com/

Edit: SonicVisualizer is also a great tool to get annotation data (f0 ground truth, chord estimates, etc) from audio.

woodshop commented 7 years ago

Adding to @lemonzi's suggestion, I recommend checking out @craffel's recently released Lakh MIDI Dataset. It provides a large set of MIDI files that have been aligned to songs in the Million Song Dataset.

bhack commented 7 years ago

/cc @osageev

georgid commented 7 years ago

If you need only f0-annotations, then I can add to the already suggested ones: RWC (annotations in MIDI) and iKala datasets. Maybe the only dataset big enough for DNNs, which is NOT annotated, but might be worth trying doing it, is: DAMP

If you need lyrics annotations as well have a look at this list: https://docs.google.com/document/d/1PAIshZ6ZpAl7ad2GiiT6Y3wFyrfFkSj4f0wok6yN4j0/edit?usp=sharing

shiba24 commented 7 years ago

Hi, I just tried training on bird's "song", which is commonly used for the study of language in neuroscience, and it seems learning really nicely. :) I wrote simple background and results on my repo (https://github.com/shiba24/birdsong-generation-project). Although this is not human song or music, any feedback and/or comments will be welcomed if you kindly though it's interesting! Thank you.

bhack commented 7 years ago

https://arxiv.org/abs/1704.03809 and http://www.dtic.upf.edu/~mblaauw/IS2017_NPSS/