Closed ari-ruokamo closed 3 months ago
No additional changes are needed. Is your 'other' section obtained by overlaying 'drums,' 'bass,' and 'others'? It's best not to do this, as it could weaken the effect of data augmentation.
No additional changes are needed. Is your 'other' section obtained by overlaying 'drums,' 'bass,' and 'others'? It's best not to do this, as it could weaken the effect of data augmentation.
Yes it is - that came logically to my mind - as there are two targets, the vocals and mixture-vocals. Why it's not beneficial to do so? Isn't the augmentation done on per instrument basis? Should I opt only for vocals then if I desire to train a model that extracts and outputs the vocals and the "rest"?
Thanks again!
It’s better to achieve the separation of vocals and other through modifications in the solver rather than by overlaying the data, unless you only have audio for "vocals" and "other".
sources = sources.to(self.device)
if train:
sources = self.augment(sources)
mix = sources.sum(dim=1)
# data.sources: ['drums', 'bass', 'other', 'vocals'] ---vocals_idx=4, other_idx=3
other = sources[:, 0:3].sum(dim=1)
vocals = sources[:, 3].unsqueeze(dim=1)
sources = torch.cat([other.unsqueeze(dim=1), vocals], dim=1)
else:
mix = sources[:, 0]
other = sources[:, 1:4].sum(dim=1)
vocals = sources[:, 4].unsqueeze(dim=1)
sources = torch.cat([other.unsqueeze(dim=1), vocals], dim=1)
It’s better to achieve the separation of vocals and other through modifications in the solver rather than by overlaying the data, unless you only have audio for "vocals" and "other".
-- clip --
Hmmm, the assertion fires in the upcoming row as the sources vs. estimates sizes differ, and the computation wouldn't pass the following spec_rmse_loss(...) function.
Ok, I don't know what happened with the initial OV-configuration; I checked and reset everything and restarted the training run, and it seems the training took off differently and the model convergence looks promising.
Thank you for all your help @starrytong.
I may close this ticket now.
I see there's a closed ticket about training other targets than the default 4 stems - a follow up question:
I scripted the dataset for OV (other, vocals, mixture). I changed the config.yaml both,
Is the solver index manipulation still required like here: https://github.com/starrytong/SCNet/issues/4
Because the training and evaluation results seem not to be right e.g. total- and instrument nsdr values are negative or very low?