Closed rabitt closed 3 years ago
just pasting this here from the PR:
I want to be able to use these files for source separation and I think a simple sum of audios is not enough. what I need: sources (OrderedDict): dictionary comprising Source objects, one source represents one audio file - we have multiple audio files per Track targets (OrderedDict): dictionary comprising Target objects, one Target is a linear mix of multiple sources - think of these as groups. for instance voice, drums, bass, other are groups but so is accompaniment as drums+bass+other mix (Target): a Target object, a linear mix of all sources - this is what you probably suggested as a simple sum of sources
I could have done it in a similar way to musdb_sources but that requires a similar implementation of Target and Source after calling the mirdata object. the thing we want to avoid is non-standard ways of working with a dataset. so that's why I proposed these objects: Target and Source. so everyone mixes tracks in the same way This code is derived from musdb. I think it's one of the best examples to implement multi-tracks for a given Track (if you look at their repo, the Multi-Track object is the same as our Track).
let's see where the discussion goes. I am open to some changes. btw, there is a possibility that each Source is multi-channel (stereo or more). in the original experiments phenicx-anechoic was multichannel having 6 mics and a source .wav file in each mic. I didn't implement this because it was already too much. but once this is solve I will start working on it.
Correct me if I'm misunderstanding something here - as I understand it:
sources
: data loaded from audio files
targets
: a submix of several sources, which can be created "on the fly" by summing a specified subset of sources
mix
: not loaded from an audio file, but the sum of the sources, which will always be the same. Equivalent to a target
with all sources
If that's all consistent with what you're saying, I think having a MultiTrack object in mirdata which is able to create targets/mix in a consistent way should be totally doable.
Bringing up GuitarSet
to the discussion because it has multitrack audio
, in particular 6 channels. Whatever we decide to do we should be consistent and propagate changes there. If we go as GuitarSet
, minor changes on the API
are needed. Comments on how multi-track is done there:
Multi-track audio and mixtures: In GuitarSet
, the multitrack audio is loaded with librosa
resulting in a tuple (np.array
, sr
), where the np.array
has shape (6, samples)
. Mixtures are pre-made so they're simply loaded, but note that they could be created on-the-fly in an attribute loader method (e.g. create attribute mix
, and create the mix when loading in a systematic manner).
to_jams() method: GuitarSet released JAMs
annotations so to_jams()
simply loads that. Opening the resulting jams object, the way they solved the multi-track issue was by adding multiple annotation types and indicating the Data source
field from [0, .. , 6]. I think that in any case we should modify the to_jams()
to do something similar, i.e. to support multi-tracks.
So I'm not entirely sure if the Multi-track
object is needed. We could instead:
GuitarSet
to_jams
to support muti-track annotations similarly to GuitarSet
Am I missing any other functionality we need from multi-tracks? I'm a bit unsure about modifying indexes and so on. Thoughts @rabitt @nkundiushuti ?
Will be closed by #304
Bringing up
GuitarSet
to the discussion because it hasmultitrack audio
, in particular 6 channels. Whatever we decide to do we should be consistent and propagate changes there. If we go asGuitarSet
, minor changes on theAPI
are needed. Comments on how multi-track is done there:Multi-track audio and mixtures: In
GuitarSet
, the multitrack audio is loaded withlibrosa
resulting in a tuple (np.array
,sr
), where thenp.array
has shape(6, samples)
. Mixtures are pre-made so they're simply loaded, but note that they could be created on-the-fly in an attribute loader method (e.g. create attributemix
, and create the mix when loading in a systematic manner).to_jams() method: GuitarSet released
JAMs
annotations soto_jams()
simply loads that. Opening the resulting jams object, the way they solved the multi-track issue was by adding multiple annotation types and indicating theData source
field from [0, .. , 6]. I think that in any case we should modify theto_jams()
to do something similar, i.e. to support multi-tracks.So I'm not entirely sure if the
Multi-track
object is needed. We could instead:
- Load multitrack in the track obtect as
GuitarSet
- Create mixtures in a systematic way on the fly
- Modify
to_jams
to support muti-track annotations similarly toGuitarSet
Am I missing any other functionality we need from multi-tracks? I'm a bit unsure about modifying indexes and so on. Thoughts @rabitt @nkundiushuti ?
@magdalenafuentes, in guitarset the Tracks may actually be a single track with multi-channel audio but can also be considered as a Track with a corresponding annotation. I think it depends on the people who use it for the research. would it be more useful for them to have it as MultiTrack (in my case I need multi-track to create various targets on the fly ) or they just need to quickly load the 6 audios and the mix?
sources: data loaded from audio files targets: a submix of several sources, which can be created "on the fly" by summing a specified subset of sources mix: not loaded from an audio file, but the sum of the sources, which will always be the same. Equivalent to a target with all sources <- this is slightly biased by source separation research.
@magdalenafuentes, in guitarset the Tracks may actually be a single track with multi-channel audio but can also be considered as a Track with a corresponding annotation.
+1 ! I consdier guitarset to be a dataset of tracks with multichannel audio, rather than a multitrack. I don't think for this dataset the on-the-fly mixing is a big use case, in particular since the channels bleed into one another.
By the way, this is now closed by #304
How can we best support multitrack datasets? The current solution is to have each 'Track' be a multitrack with a ton of attributes, but it's clunky and difficult to take full advantage of the multitracks themselves. We're also loosely tying a
Track
to something that can be mapped to ajams
file/object, and jams isnt built for multiple audio files.My current thinking is we could index at the stem level, and support a new base class
MultiTrack
which would group multipleTrack
objects. The grouping could be stored as part of the index, and any mixture-level annotations could itself be stored as aTrack
object, and be a part of the multitrack object.Thoughts?
cc @nkundiushuti - we could test this out with Phenicx-Anechoic #270