microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.11k stars 2.44k forks source link

How to load WavLM ECAPA-TDNN embeddings for Speaker verification ? #1369

Open amitli1 opened 7 months ago

amitli1 commented 7 months ago

According to the WavLM paper: (WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing)

They used ECAPA-TDNN embeddings model for the downstream task: Speaker verification.

I searched but didn't found, is there any implementation which I can used with the model ? (WavLM embeddings which produced by ECAPA-TDNN) ?

For example:

import torch from transformers import Wav2Vec2FeatureExtractor from transformers import WavLMForXVector import soundfile as sf

wav_tensor, sr = sf.read(r"nyfile.wav")

device = "cuda" if torch.cuda.is_available() else "cpu"
feature_extractor_wav2vec = Wav2Vec2FeatureExtractor.from_pretrained("microsoft/wavlm-base-plus-sv")
model_wav_lm = WavLMForXVector.from_pretrained("microsoft/wavlm-base-plus-sv").to(device)

inputs = feature_extractor_wav2vec(wav_tensor,sampling_rate=16000,return_tensors="pt",padding=True).to(device)
with torch.no_grad():
    embeddings = model_wav_lm(**inputs).embeddings

I didn't saw if the embeddings came from ECAPA-TDNN or from X-Vector.

Edresson commented 6 months ago

@amitli1 Did you find any solution for this?

I think the code is available here https://github.com/microsoft/UniSpeech/blob/e3043e2021d49429a406be09b9b8432febcdec73/downstreams/speaker_verification/models/ecapa_tdnn.py but I didn't find any checkpoint for it. A lot of papers are using WavLM-TDNN currently, so I'm not sure what we are missing. It might be available somewhere.

chenyang399 commented 2 months ago

i am also searching for the WavLM-TDNN checkpoint, but find nothing, i think we need to train it by ourself using the superb. however superb didnt have the ecaptdnn code

maxpain commented 1 month ago

Any updates?