Closed Saltb0xApps closed 3 weeks ago
Hi, thanks for using my code repo. I already wrote the dataloader for Slakh dataset and moisesDB dataset which you can refer to. For the custom dataset, the data is returned in (input audio, output audio, instruction) format from the dataloader.
For the data, yes, this model requires paired data, which you usually can get it from separate stem.
So if i understand correctly, putting in the same data that works for the base Musicgen model (instrumentals) will not work for this. We need a dataset that consists of -
What structure works best for the descriptions? For example -
Will dive further into moisesDB & Slakh dataset to really understand the details! Thank you :)
Since instruct-MusicGen is a model for music editing, then a sample data could be: (mix without stem, mix with stem, "instruct: add [] stem. ") or (mix with stem, mix without stem, "instruct: remove [] stem. ") or (mix with stem, stem, "instruct: extract [] stem. ")
From the original paper there is no specific description for the stem, but I think it is okay to add desctiption to the instruction. You can try both "An energetic hip hop track with guitars, piano, and drums. instruct: extract [] stem. " or "An energetic hip hop track. instruct: extract energetic [] stem. " or "Music. instruct: extract energetic [] stem. "
Hey! Amazing work modifying musicgen with instruct capabilities. I have a dataset of about 300k audio files that are copyright free and i want to train the model from scratch.
I'm wondering that does training this require just instrumental tracks with descriptions, or do we need train this on individual stems (possibly split from the instrumentals using demucs?).
It would be really helpful if you could share 2-5 examples that represent the quality and structure of the dataset that was used in the paper like the audiocraft repo - https://github.com/facebookresearch/audiocraft/tree/main/dataset/example