pquochuy / idsegan

47 stars 14 forks source link

Use of the IDSEGAN for Music Source Separation #1

Open RadioAngurem opened 4 years ago

RadioAngurem commented 4 years ago

Hi, I´m interesting in the Music Source Separtion (MSS) field but all the SotA models like DEMUCS and ConvTastNet produce some noise in every output track.

Could be feasible to train the ISEGAN model to "denoise" the output tracks of a MSS network (bass, drums, others and vocals)?. Training four IDSEGAN networks with pairs MMSOutputBass-OriginalBass, MSSOutputDrums-Original Drums, etc.

Could be feasible to scale the 16 KHz SE task to the 44.1 Khz used in the MSS task or the need for more frecuency bins could made the network unreliable?.

pquochuy commented 4 years ago

Hi, I am not really certain how ISEGAN or DSEGAN will behave on your source separation artifacts. But definitely you should try out. I hope it would work out for you.

Since the network works on 1-second segment, with 44.1 kHz sampling rate, you probably need to configure a deeper encoder (and decoder) if you want to encode the input signal to a dimensionality as low as with 16 Khz sampling rate. The network footprint is expected to be bigger though.