Open autumn-2-net opened 2 weeks ago
Directly predicting the complex spectrum or predicting the mask are both common approaches in separation tasks. In my experiments, predicting the complex spectrum performs slightly better.
However, directly predicting the complex spectrum seems to generate additional background noise.Is there any solution to this?
scnet:
band spilt roformer :
I think this issue might not only be related to the prediction target. I don't have a good solution for now, but I will continue to try the approach of predicting masks.
and why Dual-path Module use lstm not roformer. is lstm can have berrer result than roformer?
Why is FeatureConversion used? Is it because it can increase the number of parameters and improve quality without affecting speed?
Also, if I increase the kernel_size in the ConvolutionModule, can it achieve better results?
I haven't tried RoFormer, so I can't determine which one is better. FeatureConversion allows the model to learn in different spaces, which will affect the speed, but it is more effective than continuing to stack DualPath layers (6 layers + FeatureConversion > 9 layers of DP). Increasing the kernel size might be slightly better, but the improvement may not be significant.
Thank you for your great work.
Could you explain why SCNet directly predicts the complex spectrum instead of predicting a mask like BandSplitRoformer? Is this choice made because predicting the complex spectrum achieves better performance?