I'm trying to use BEATs as a feature extractor to calculate similarity between two different audio files. There is a statement that cosine-based similarity is used to calculate similarity scores in the BEATs paper. However, I can not calculate the similarity between two different length of audio files since the feature dimensions are different. Shouldn't the feature extraction map the audio vectors to the same dimension?
Thanks in advance for the responses.
for ref in reference_set_paths:
sr, audio = wavfile.read(folder_name / ref)
audio = torch.from_numpy(audio).unsqueeze(0)
rep = BEATs_model.extract_features(audio)[0]
print(rep.shape)
Model: BEATs_iter3_plus_AS20K
I'm trying to use BEATs as a feature extractor to calculate similarity between two different audio files. There is a statement that cosine-based similarity is used to calculate similarity scores in the BEATs paper. However, I can not calculate the similarity between two different length of audio files since the feature dimensions are different. Shouldn't the feature extraction map the audio vectors to the same dimension?
Thanks in advance for the responses.