sp-uhh / storm

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
MIT License
164 stars 22 forks source link

Stereo audio #9

Closed adeelabbas closed 1 year ago

adeelabbas commented 1 year ago

Hi, Does the tool work with stereo audio input? Do you know what changes would be needed to support it? Adeel

jmlemercier commented 1 year ago

Hi, You can provide multichannel inputs in a batch to the diffusion model. This will treat each channel independently: therefore, the inter-channel magnitude/phase difference information will not be exploited, and there is no guarantee than these will be preserved by the processing. For your information however, we researched into diffusion models leveraging multi-channel information and can share a few insights, part of which confirm results of Tesch et al. , "Insights Into Deep Non-linear Filters for Improved Multi-channel Speech Enhancement", TASL 2022 and Tesch et al. , "Nonlinear Spatial Filtering in Multichannel Speech Enhancement", TASL 2021:

Of course, this is to be considered carefully: the extension we proposed for multi-channel processing was very naive and could probably be improved. Given the preliminary results, we did not research further in that direction.