For ERes2NetV2 performance on short-duration wavs

modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Apache License 2.0

1.02k stars 89 forks source link

For ERes2NetV2 performance on short-duration wavs #111

Closed JiJiJiang closed 2 months ago

JiJiJiang commented 2 months ago

Thank you for your well design of the ERes2Net model and make it open-source.

As you mention, the V2 version of ERes2Net improves the short-duration feature extraction capability of ERes2Net. Are there any experimental results that support this conclusion?

If so, the ERes2Net model could be better for diarization task using the traditional clustering-based system. In this case, we usually extract speaker embeddings using a sliding-window, e.g., 1.5s.

yfchenlucky commented 2 months ago

Our experiments have thoroughly validated this conclusion, and the paper will be open-sourced in June. Thank you for your interest. You are invited to join the 3D-Speaker technical sharing session tonight at 8 pm. You can access the meeting through this link: https://mp.weixin.qq.com/s/uwvVUIDb0eaAHlfWiuwEoQ.

JiJiJiang commented 2 months ago

OK I see, thank you for your answer. Looking forward to your talk and paper.