merlresearch / tf-locoformer

Transformer with Local Modeling by Convolution for Speech Separation and Enhancement
Apache License 2.0
34 stars 4 forks source link

Questions about LocoformerBlock #1

Open Mashiro009 opened 2 months ago

Mashiro009 commented 2 months ago

Hello tf-locoformer team! 👋

Firstly, I wanted to extend my heartfelt appreciation for the groundbreaking work you've accomplished with tf-locoformer. Your innovation truly stands out!

While diving into your research, I had the pleasure of reading through your paper and examining the code. I stumbled upon a small inconsistency that piqued my curiosity.

In the paper, the LocoformerBlock is elegantly outlined with a half-addition (or 1/2 addition) of ConvSwiGLU, depicted as:

Z ← Z + (ConvSwiGLU(Z))/2     ... (2)
Z ← Z + MHSA(Norm(Z))        ... (3)
Z ← Z + (ConvSwiGLU(Z))/2     ... (4)

This suggests a scaling of the ConvSwiGLU output by 0.5 before summation. However, in the code snippet found at: https://github.com/merlresearch/tf-locoformer/blob/1d2be38404cf062018a1aaebe4c8103a63b23adc/espnet2/enh/separator/tflocoformer_separator.py#L366-L370 it appears the operation might be implemented as a straightforward addition without the scaling factor.

I'm hoping you could kindly clarify this point for me. It would be a tremendous help in my understanding and application of your work. Thank you ever so much for your time and efforts – they are deeply appreciated!

Warmest regards, Thank you very much!

kohei0209 commented 2 months ago

Hi @Mashiro009, thank you for reporting the issue.

I am sorry for the confusion. Yes, we had a 0.5 factor for the paper experiments but removed it by mistake when publishing the code.

However, in preliminary experiments, we found that the scaling factor does not impact the final performance in our model. We kept it in the paper experiments for consistency with some prior work (e.g., Macaron-net).

Although confusing, we will maintain the current implementation to make it consistent with the pre-trained models and add some comments.