ws-choi / Conditioned-Source-Separation-LaSAFT

A PyTorch implementation of the paper: "LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation" (ICASSP 2021)
MIT License
85 stars 18 forks source link

length difference between input and output signal #16

Closed sun-peach closed 3 years ago

sun-peach commented 3 years ago

I trained my own model and follow the quick start demo "Quickstart: How to use Pretrained Models". However I find the signal lengths of input and output are different. Is this right?

When I trained the model, I tuned the trim length, hop length and window length.

Thank you.

ws-choi commented 3 years ago

Hi @sun-peach,

TLDR: right, lengths of input and output might be different.


separate

at the low level, our model is designed to take input with the fixed length. The function separate takes an input with the fixed-length. https://github.com/ws-choi/Conditioned-Source-Separation-LaSAFT/blob/a3e60bfdc1d5b4d20f5d5df852241a0c8d80420a/lasaft/source_separation/conditioned/cunet/dcun_base.py#L191

If you use separate_track like in the QuickStart, you don't have to split the original track manually.

separate_track

https://github.com/ws-choi/Conditioned-Source-Separation-LaSAFT/blob/a3e60bfdc1d5b4d20f5d5df852241a0c8d80420a/lasaft/source_separation/conditioned/cunet/dcun_base.py#L252

separate_track automatically split the input to sub-chunks to fit separate. This operation is capsulated in the SingleTrackSet object.

https://github.com/ws-choi/Conditioned-Source-Separation-LaSAFT/blob/a3e60bfdc1d5b4d20f5d5df852241a0c8d80420a/lasaft/source_separation/conditioned/cunet/dcun_base.py#L258

SingleTrackSet

SingleTrackSet automatically split the input into chunks. For each chunk, it appends paddings if its length is not fit to the separate_track. https://github.com/ws-choi/Conditioned-Source-Separation-LaSAFT/blob/a3e60bfdc1d5b4d20f5d5df852241a0c8d80420a/lasaft/data/musdb_wrapper.py#L272

This is why lengths of input and output might be different. If you want to make them have the same length, then try

input_length = input.shape[1] # input => numpy array with shape: [channel, T]
output = output[:,:input_length]
sun-peach commented 3 years ago

Got it. Have done that. Thank you.