summary: Voice data generated on instant messaging or social media applications
contains unique user voiceprints that may be abused by malicious adversaries
for identity inference or identity theft. Existing voice anonymization
techniques, e.g., signal processing and voice conversion/synthesis, suffer from
degradation of perceptual quality. In this paper, we develop a voice
anonymization system, named V-Cloak, which attains real-time voice
anonymization while preserving the intelligibility, naturalness and timbre of
the audio. Our designed anonymizer features a one-shot generative model that
modulates the features of the original audio at different frequency levels. We
train the anonymizer with a carefully-designed loss function. Apart from the
anonymity loss, we further incorporate the intelligibility loss and the
psychoacoustics-based naturalness loss. The anonymizer can realize untargeted
and targeted anonymization to achieve the anonymity goals of unidentifiability
and unlinkability.
We have conducted extensive experiments on four datasets, i.e., LibriSpeech
(English), AISHELL (Chinese), CommonVoice (French) and CommonVoice (Italian),
five Automatic Speaker Verification (ASV) systems (including two DNN-based, two
statistical and one commercial ASV), and eleven Automatic Speech Recognition
(ASR) systems (for different languages). Experiment results confirm that
V-Cloak outperforms five baselines in terms of anonymity performance. We also
demonstrate that V-Cloak trained only on the VoxCeleb1 dataset against
ECAPA-TDNN ASV and DeepSpeech2 ASR has transferable anonymity against other
ASVs and cross-language intelligibility for other ASRs. Furthermore, we verify
the robustness of V-Cloak against various de-noising techniques and adaptive
attacks. Hopefully, V-Cloak may provide a cloak for us in a prism world.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: V-Cloak: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time Voice Anonymization
summary: Voice data generated on instant messaging or social media applications contains unique user voiceprints that may be abused by malicious adversaries for identity inference or identity theft. Existing voice anonymization techniques, e.g., signal processing and voice conversion/synthesis, suffer from degradation of perceptual quality. In this paper, we develop a voice anonymization system, named V-Cloak, which attains real-time voice anonymization while preserving the intelligibility, naturalness and timbre of the audio. Our designed anonymizer features a one-shot generative model that modulates the features of the original audio at different frequency levels. We train the anonymizer with a carefully-designed loss function. Apart from the anonymity loss, we further incorporate the intelligibility loss and the psychoacoustics-based naturalness loss. The anonymizer can realize untargeted and targeted anonymization to achieve the anonymity goals of unidentifiability and unlinkability. We have conducted extensive experiments on four datasets, i.e., LibriSpeech (English), AISHELL (Chinese), CommonVoice (French) and CommonVoice (Italian), five Automatic Speaker Verification (ASV) systems (including two DNN-based, two statistical and one commercial ASV), and eleven Automatic Speech Recognition (ASR) systems (for different languages). Experiment results confirm that V-Cloak outperforms five baselines in terms of anonymity performance. We also demonstrate that V-Cloak trained only on the VoxCeleb1 dataset against ECAPA-TDNN ASV and DeepSpeech2 ASR has transferable anonymity against other ASVs and cross-language intelligibility for other ASRs. Furthermore, we verify the robustness of V-Cloak against various de-noising techniques and adaptive attacks. Hopefully, V-Cloak may provide a cloak for us in a prism world.
id: http://arxiv.org/abs/2210.15140v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.