I have a suggestion to improve the separation quality. You probably already know about my implementation of multiband spectrograms, but I will be glad if you implement it at the neural network level.
I am not very versed in the architecture of neural networks, so I decided not to touch the network code, but used the combination of several spectrograms into one. Such spectrograms take up less memory and have better time-frequency resolution than single-band spectrograms. However, it can work even better if we connect each band to a separate network:
I have a suggestion to improve the separation quality. You probably already know about my implementation of multiband spectrograms, but I will be glad if you implement it at the neural network level.
I am not very versed in the architecture of neural networks, so I decided not to touch the network code, but used the combination of several spectrograms into one. Such spectrograms take up less memory and have better time-frequency resolution than single-band spectrograms. However, it can work even better if we connect each band to a separate network:
The x is a dictionary containing many spectrograms at different resolutions. For example:
1st band: sr=7350, n_fft=640, hop_length=80 2nd band: sr=7350, n_fft=320, hop_length=80 3rd band: sr=14700, n_fft=512, hop_length=160 4th band: sr=44100, n_fft=960, hop_length=480
Unused frequencies are cut out. The difficulty in implementation is that each band will contain a different number of bins.