mir-dataset-loaders / mirdata

Python library for working with Music Information Retrieval datasets
https://mirdata.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
358 stars 58 forks source link

Multitrack Datasets #276

Closed rabitt closed 3 years ago

rabitt commented 4 years ago

How can we best support multitrack datasets? The current solution is to have each 'Track' be a multitrack with a ton of attributes, but it's clunky and difficult to take full advantage of the multitracks themselves. We're also loosely tying a Track to something that can be mapped to a jams file/object, and jams isnt built for multiple audio files.

My current thinking is we could index at the stem level, and support a new base class MultiTrack which would group multiple Track objects. The grouping could be stored as part of the index, and any mixture-level annotations could itself be stored as a Track object, and be a part of the multitrack object.

Thoughts?

cc @nkundiushuti - we could test this out with Phenicx-Anechoic #270

nkundiushuti commented 4 years ago

just pasting this here from the PR:

I want to be able to use these files for source separation and I think a simple sum of audios is not enough. what I need: sources (OrderedDict): dictionary comprising Source objects, one source represents one audio file - we have multiple audio files per Track targets (OrderedDict): dictionary comprising Target objects, one Target is a linear mix of multiple sources - think of these as groups. for instance voice, drums, bass, other are groups but so is accompaniment as drums+bass+other mix (Target): a Target object, a linear mix of all sources - this is what you probably suggested as a simple sum of sources

I could have done it in a similar way to musdb_sources but that requires a similar implementation of Target and Source after calling the mirdata object. the thing we want to avoid is non-standard ways of working with a dataset. so that's why I proposed these objects: Target and Source. so everyone mixes tracks in the same way This code is derived from musdb. I think it's one of the best examples to implement multi-tracks for a given Track (if you look at their repo, the Multi-Track object is the same as our Track).

let's see where the discussion goes. I am open to some changes. btw, there is a possibility that each Source is multi-channel (stereo or more). in the original experiments phenicx-anechoic was multichannel having 6 mics and a source .wav file in each mic. I didn't implement this because it was already too much. but once this is solve I will start working on it.

rabitt commented 4 years ago

Correct me if I'm misunderstanding something here - as I understand it: sources: data loaded from audio files targets: a submix of several sources, which can be created "on the fly" by summing a specified subset of sources mix: not loaded from an audio file, but the sum of the sources, which will always be the same. Equivalent to a target with all sources

If that's all consistent with what you're saying, I think having a MultiTrack object in mirdata which is able to create targets/mix in a consistent way should be totally doable.

magdalenafuentes commented 4 years ago

Bringing up GuitarSet to the discussion because it has multitrack audio, in particular 6 channels. Whatever we decide to do we should be consistent and propagate changes there. If we go as GuitarSet, minor changes on the API are needed. Comments on how multi-track is done there:

Multi-track audio and mixtures: In GuitarSet, the multitrack audio is loaded with librosa resulting in a tuple (np.array, sr), where the np.array has shape (6, samples). Mixtures are pre-made so they're simply loaded, but note that they could be created on-the-fly in an attribute loader method (e.g. create attribute mix, and create the mix when loading in a systematic manner).

to_jams() method: GuitarSet released JAMs annotations so to_jams() simply loads that. Opening the resulting jams object, the way they solved the multi-track issue was by adding multiple annotation types and indicating the Data source field from [0, .. , 6]. I think that in any case we should modify the to_jams() to do something similar, i.e. to support multi-tracks.

So I'm not entirely sure if the Multi-track object is needed. We could instead:

  1. Load multitrack in the track obtect as GuitarSet
  2. Create mixtures in a systematic way on the fly
  3. Modify to_jams to support muti-track annotations similarly to GuitarSet

Am I missing any other functionality we need from multi-tracks? I'm a bit unsure about modifying indexes and so on. Thoughts @rabitt @nkundiushuti ?

rabitt commented 3 years ago

Will be closed by #304

nkundiushuti commented 3 years ago

Bringing up GuitarSet to the discussion because it has multitrack audio, in particular 6 channels. Whatever we decide to do we should be consistent and propagate changes there. If we go as GuitarSet, minor changes on the API are needed. Comments on how multi-track is done there:

Multi-track audio and mixtures: In GuitarSet, the multitrack audio is loaded with librosa resulting in a tuple (np.array, sr), where the np.array has shape (6, samples). Mixtures are pre-made so they're simply loaded, but note that they could be created on-the-fly in an attribute loader method (e.g. create attribute mix, and create the mix when loading in a systematic manner).

to_jams() method: GuitarSet released JAMs annotations so to_jams() simply loads that. Opening the resulting jams object, the way they solved the multi-track issue was by adding multiple annotation types and indicating the Data source field from [0, .. , 6]. I think that in any case we should modify the to_jams() to do something similar, i.e. to support multi-tracks.

So I'm not entirely sure if the Multi-track object is needed. We could instead:

  1. Load multitrack in the track obtect as GuitarSet
  2. Create mixtures in a systematic way on the fly
  3. Modify to_jams to support muti-track annotations similarly to GuitarSet

Am I missing any other functionality we need from multi-tracks? I'm a bit unsure about modifying indexes and so on. Thoughts @rabitt @nkundiushuti ?

@magdalenafuentes, in guitarset the Tracks may actually be a single track with multi-channel audio but can also be considered as a Track with a corresponding annotation. I think it depends on the people who use it for the research. would it be more useful for them to have it as MultiTrack (in my case I need multi-track to create various targets on the fly ) or they just need to quickly load the 6 audios and the mix?

sources: data loaded from audio files targets: a submix of several sources, which can be created "on the fly" by summing a specified subset of sources mix: not loaded from an audio file, but the sum of the sources, which will always be the same. Equivalent to a target with all sources <- this is slightly biased by source separation research.

rabitt commented 3 years ago

@magdalenafuentes, in guitarset the Tracks may actually be a single track with multi-channel audio but can also be considered as a Track with a corresponding annotation.

+1 ! I consdier guitarset to be a dataset of tracks with multichannel audio, rather than a multitrack. I don't think for this dataset the on-the-fly mixing is a big use case, in particular since the channels bleed into one another.

By the way, this is now closed by #304