microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.08k stars 2.43k forks source link

About using BEATs as audio feature extractor #1567

Open XiaokangY opened 1 month ago

XiaokangY commented 1 month ago

Do I have to use an audio sequence with a sampling rate of 16k to use BEATs? Because I found that when I further input the extracted features into resnet18 for the next classification task, I found that the loss could not be reduced.

tcourat commented 3 weeks ago

I belive that the model already preprocess input data to 16000 Hz here : https://github.com/microsoft/unilm/blob/8ee6f747da65448b125363eabbe64630bb9c4a25/beats/BEATs.py#L127

XiaokangY commented 3 weeks ago

I belive that the model already preprocess input data to 16000 Hz here :我相信该模型已经将输入数据预处理为 16000 Hz:

https://github.com/microsoft/unilm/blob/8ee6f747da65448b125363eabbe64630bb9c4a25/beats/BEATs.py#L127 Haha, I have solved this problem. The problem is that the loss cannot be reduced when the following features are input into the pre-trained model for training. This problem has been bothering me for a long time.