ly-zhu / Leveraging-Category-Information-for-Single-Frame-Visual-Sound-Source-Separation

PyTorch implementation of "Leveraging Category Information for Single-Frame Visual Sound Source Separation"
https://ly-zhu.github.io/leveraging-category-information-for-single-frame-visual-sound-source-separation
MIT License
6 stars 2 forks source link

separate sound type #2

Open slliugit opened 1 year ago

slliugit commented 1 year ago

Hi, could you please tell me that if the model could separate the sound of musical instrument and singing by person? Thank you!

ly-zhu commented 1 year ago

The idea of this work is to utilise the appearance information (instrument itself and human lips or head region in you case) of the objects to separate their corresponding sounds. So in my opinion, it should work.

The problem I see is that the musician instruments are usually played by a person. Thus, the person often present in the the image together with the instruments. It may cause some sort of “confusion” for training the network if the human lips or head region are part of the “appearance” image that represents the instrument.