microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.26k stars 2.56k forks source link

[BEATs] How to handle different length of audio files? #1621

Open omerkaanvural opened 2 months ago

omerkaanvural commented 2 months ago

Model: BEATs_iter3_plus_AS20K

I'm trying to use BEATs as a feature extractor to calculate similarity between two different audio files. There is a statement that cosine-based similarity is used to calculate similarity scores in the BEATs paper. However, I can not calculate the similarity between two different length of audio files since the feature dimensions are different. Shouldn't the feature extraction map the audio vectors to the same dimension?

Thanks in advance for the responses.

for ref in reference_set_paths:
    sr, audio = wavfile.read(folder_name / ref)
    audio = torch.from_numpy(audio).unsqueeze(0)
    rep = BEATs_model.extract_features(audio)[0]
    print(rep.shape)
sizes