'Voice Conversion' paper candidate 2309.08208

github-actions[bot] commented 1 year ago

Please check whether this paper is about 'Voice Conversion' or not.

article info.

title: HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
summary: Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since the Conformer was designed for sequence-to-sequence tasks, its direct application to ADD tasks may be sub-optimal. To tackle this limitation, we propose HM-Conformer by adopting two components: (1) Hierarchical pooling method progressively reducing the sequence length to eliminate duplicated information (2) Multi-level classification token aggregation method utilizing classification tokens to gather information from different blocks. Owing to these components, HM-Conformer can efficiently detect spoofing evidence by processing various sequence lengths and aggregating them. In experimental results on the ASVspoof 2021 Deepfake dataset, HM-Conformer achieved a 15.71% EER, showing competitive performance compared to recent systems.
id: http://arxiv.org/abs/2309.08208v1

judge

Write [vclab::confirmed] or [vclab::excluded] in comment.

tarepan commented 1 year ago

[vclab::excluded]

github-actions[bot] commented 1 year ago

Thunk you very much for contribution! Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity. Thunk you so much.

tarepan / VoiceConversionLab

'Voice Conversion' paper candidate 2309.08208 #516

article info.

judge