starrytong / SCNet

MIT License
59 stars 5 forks source link

about Model Implementation #18

Open autumn-2-net opened 2 weeks ago

autumn-2-net commented 2 weeks ago

Thank you for your great work.

Could you explain why SCNet directly predicts the complex spectrum instead of predicting a mask like BandSplitRoformer? Is this choice made because predicting the complex spectrum achieves better performance?

starrytong commented 2 weeks ago

Directly predicting the complex spectrum or predicting the mask are both common approaches in separation tasks. In my experiments, predicting the complex spectrum performs slightly better.

autumn-2-net commented 2 weeks ago

However, directly predicting the complex spectrum seems to generate additional background noise.Is there any solution to this?

scnet: 图片

band spilt roformer : 图片

starrytong commented 2 weeks ago

I think this issue might not only be related to the prediction target. I don't have a good solution for now, but I will continue to try the approach of predicting masks.

autumn-2-net commented 1 week ago

and why Dual-path Module use lstm not roformer. is lstm can have berrer result than roformer?

autumn-2-net commented 1 week ago

Why is FeatureConversion used? Is it because it can increase the number of parameters and improve quality without affecting speed?

Also, if I increase the kernel_size in the ConvolutionModule, can it achieve better results?

starrytong commented 1 week ago

I haven't tried RoFormer, so I can't determine which one is better. FeatureConversion allows the model to learn in different spaces, which will affect the speed, but it is more effective than continuing to stack DualPath layers (6 layers + FeatureConversion > 9 layers of DP). Increasing the kernel size might be slightly better, but the improvement may not be significant.