Open theadamsabra opened 2 years ago
Hi! Thanks for your interest! Yes, the latter. MIDI-DDSP takes in a monophonic midi melody as input and the generated audio as output.
thank you so much for your prompt response. for training, should the output be the same melody as the midi input? meaning if i want to train on a new instrument i need the midi transcription
Yes. You need paired MIDI and Audio data to train MIDI-DDSP. MIDI-DDSP currently does not support training on dataset other than URMP, so you might need some hack to do so. Last, Audio-MIDI alignment quality will affect the generation quality of MIDI-DDSP as the extraction of the note expression relies on the note boundary.
I see. Thank you!
How "accurate"/reliable was URMP in alignment quality? Also, do you use certain metrics used to measure and assess alignment quality?
I don't have a metric of the alignment quality, but the MIDI (note boundary) in the URMP dataset is manually labeled. So I manually checked the MIDI alignment with the audio, and empirically I found the URMP dataset has a very good alignment quality.
Thanks for all of your help. I would love to help out and improve the repository in any way. How difficult do you think it be to allow training on arbitrary datasets?
Well... I gotta confess because this codebase is not well-written (by myself), so you will need some hacks. Here are some steps you should do:
If all the above works, then I believe it can run on arbitrary datasets. This is on my todo list, but I do not have the throughput to do so :(. Good luck about that!
quick question on this figure in the blog post: i know coconet is its own model that will generate subsequent melodies given the input midi file. however, should i decide to train midi ddsp, will the training of coconet also be a part of this? or should i expect a monophonic midi melody as input and the generated audio as output.
thanks for all the help and this awesome project