Closed tky823 closed 3 years ago
No it's not completely the same as the problem definition of TasNet and FaSNet are different: TasNet attempts to estimate multiplicative masks on the encoder outputs, while FaSNet attempts to estimate the time-domain beamforming filter coefficients. The gated output layer is only applied for FaSNet to ensure the dynamic range of the estimated beamforming filters is between -1 and 1, but empirically removing the gating layer or even using a linear output layer should have a similar (or same) performance.
In DPRNN-TasNet the output layer is simply self.output without the gating layer, and the nonlinearity for the output layer can be either Sigmoid or ReLU (empirically ReLU might be better).
Thanks! I will read the FaSNet paper. So, the network architecture of DPRNN-TasNet is like
class DPRNNTasNet(...):
def __init__(...):
...
self.encoder = nn.Conv1d(...)
self.enc_LN = nn.GroupNorm(...) # Layer normalization
self.BN = nn.Conv1d(...) # Bottleneck convolution.
self.DPRNN = DPRNN(...) # includes nn.PReLU() and nn.Conv2d(...) for mask estimation
self.output = nn.Sequential(nn.Conv1d(...), nn.ReLU()) # empirically better
# or self.output = nn.Sequential(nn.Conv1d(...), nn.Sigmoid())
self.decoder = nn.ConvTranspose1d(...)
Do I understand your explanation correctly?
Yes it is something like that.
I tried reimplementation of dual-path RNN TasNet reading your paper.
Is the general structure published in this repository the same as dual-path RNN TasNet? I understand this repository is written for TAC. There seem to be some improvements like gated outputs. Are these modules included in dual-path RNN TasNet? https://github.com/yluo42/TAC/blob/96640a803b8193a7a507652c4a5693e57da03cbd/FaSNet.py#L15-L21