MusPy seems to have a really smooth pipeline for (down)loading a dataset and iterating over it or converting it to a PyTorch or TensorFlow dataset using one of the pre-defined representations. What I would like to do, and doesn't seem to be currently easy to do, is creating a new dataset object by transforming an existing dataset. An example use case would be to download the Lakh dataset, filter it using some criteria, split it into short segments, apply some data augmentation, and then use this to train a PyTorch model. Maybe something like this:
lmd_split = lmd.transform(filter_and_split_fn, "data/lmd_split") # transforms and saves dataset or reuses existing result
lmd_aug = lmd_split.transform(aug_fn, "data/lmd_aug")
lmd_aug.to_pytorch_dataset(representation="pianoroll")
where each transform function would be a function taking a single Music object and returning a list of Music objects.
Another (even more general, but maybe less efficient) possibility would be to be able to create a new dataset from a generator, e.g.:
def g():
for music in lmd:
yield music.transpose(1)
lmd_aug = FolderDataset.from_generator(g(), "data/lmd_aug")
MusPy seems to have a really smooth pipeline for (down)loading a dataset and iterating over it or converting it to a PyTorch or TensorFlow dataset using one of the pre-defined representations. What I would like to do, and doesn't seem to be currently easy to do, is creating a new dataset object by transforming an existing dataset. An example use case would be to download the Lakh dataset, filter it using some criteria, split it into short segments, apply some data augmentation, and then use this to train a PyTorch model. Maybe something like this:
where each transform function would be a function taking a single
Music
object and returning a list ofMusic
objects.Another (even more general, but maybe less efficient) possibility would be to be able to create a new dataset from a generator, e.g.: