The network architecture related to dual-path RNN TasNet.

yluo42 / TAC

transform-average-concatenate (TAC) method for end-to-end microphone permutation and number invariant ad-hoc beamforming.

258 stars 54 forks source link

The network architecture related to dual-path RNN TasNet. #9

Closed tky823 closed 3 years ago

tky823 commented 3 years ago

I tried reimplementation of dual-path RNN TasNet reading your paper.

DUAL-PATH RNN: EFFICIENT LONG SEQUENCE MODELING FOR TIME-DOMAIN SINGLE-CHANNEL SPEECH SEPARATION

Is the general structure published in this repository the same as dual-path RNN TasNet? I understand this repository is written for TAC. There seem to be some improvements like gated outputs. Are these modules included in dual-path RNN TasNet? https://github.com/yluo42/TAC/blob/96640a803b8193a7a507652c4a5693e57da03cbd/FaSNet.py#L15-L21

yluo42 commented 3 years ago

No it's not completely the same as the problem definition of TasNet and FaSNet are different: TasNet attempts to estimate multiplicative masks on the encoder outputs, while FaSNet attempts to estimate the time-domain beamforming filter coefficients. The gated output layer is only applied for FaSNet to ensure the dynamic range of the estimated beamforming filters is between -1 and 1, but empirically removing the gating layer or even using a linear output layer should have a similar (or same) performance.

In DPRNN-TasNet the output layer is simply self.output without the gating layer, and the nonlinearity for the output layer can be either Sigmoid or ReLU (empirically ReLU might be better).

tky823 commented 3 years ago

Thanks! I will read the FaSNet paper. So, the network architecture of DPRNN-TasNet is like

class DPRNNTasNet(...):
    def __init__(...):
        ...
        self.encoder = nn.Conv1d(...)
        self.enc_LN = nn.GroupNorm(...) # Layer normalization
        self.BN = nn.Conv1d(...) # Bottleneck convolution.
        self.DPRNN = DPRNN(...) # includes nn.PReLU() and nn.Conv2d(...) for mask estimation
        self.output = nn.Sequential(nn.Conv1d(...), nn.ReLU()) # empirically better
        # or self.output = nn.Sequential(nn.Conv1d(...), nn.Sigmoid())
        self.decoder = nn.ConvTranspose1d(...)

Do I understand your explanation correctly?

yluo42 commented 3 years ago

Yes it is something like that.