Open Kanraaaaa opened 8 months ago
Hi, I have the same problem with WavLM Large. Have you solved it?
Hi, I have the same problem with WavLM Large. Have you solved it?
Hi, I haven't solved this problem :( I tried different torch versions (including 1.12, 1.13, 2.0.1... ) and different platforms such as windows and linux. This problem still exists.
But I found relatively reasonable results by loading huggingface models from https://huggingface.co/microsoft/wavlm-large.
ps. I have no idea about whether 600 and -100 are reasonable actually. The max and min values extracted from wavlm-base+ are around ±5.
from transformers import AutoModel
import torchaudio
wavlm = AutoModel.from_pretrained('pretrained/large_hf')
wav_input_16khz = torch.randn(1,10000)
with torch.no_grad():
wav_embeddings = wavlm(input_values=wav_input_16khz, output_hidden_states=True).hidden_states
rep = torch.cat(wav_embeddings)
print(rep.shape, rep.max(), rep.min())
# gpu: torch.Size([25, 31, 1024]) tensor(608.1801, device='cuda:0') tensor(-123.8610, device='cuda:0')
# cpu: torch.Size([25, 31, 1024]) tensor(602.7969) tensor(-124.4809)
from transformers import AutoModel
import torchaudio
wavlm = AutoModel.from_pretrained('pretrained/wavlm-base-plus').cuda()
wav_input_16khz = torch.randn(1,10000).cuda()
with torch.no_grad():
wav_embeddings = wavlm(input_values=wav_input_16khz, output_hidden_states=True).hidden_states
rep = torch.cat(wav_embeddings)
print(rep.shape, rep.max(), rep.min())
# cpu torch.Size([13, 31, 768]) tensor(3.1511) tensor(-4.5721)
# gpu torch.Size([13, 31, 768]) tensor(3.2018, device='cuda:0') tensor(-4.5899, device='cuda:0')
Hi,thanks for sharing pre-trained models. But I have met some problems as follows: I followed the sample code on this page: https://github.com/microsoft/unilm/tree/master/wavlm ,but I got abnormal layer results with the WavLM-Large.pt.
When I infer on cpu, the results of last 2 layers are always NaN. When I infer on gpu, the max value of layer_results is 3.4342e+37.