separate sound type - Githubissues

ly-zhu / Leveraging-Category-Information-for-Single-Frame-Visual-Sound-Source-Separation

PyTorch implementation of "Leveraging Category Information for Single-Frame Visual Sound Source Separation"

MIT License

6 stars 2 forks source link

The idea of this work is to utilise the appearance information (instrument itself and human lips or head region in you case) of the objects to separate their corresponding sounds. So in my opinion, it should work.

The problem I see is that the musician instruments are usually played by a person. Thus, the person often present in the the image together with the instruments. It may cause some sort of “confusion” for training the network if the human lips or head region are part of the “appearance” image that represents the instrument.

ly-zhu / Leveraging-Category-Information-for-Single-Frame-Visual-Sound-Source-Separation

separate sound type #2