Closed AhmedHashish123 closed 2 years ago
Hi @AhmedHashish123 ,
Thank you for the interest and the question! We accidentally use the wrong feature extract function in our example usage of loading pretrained models. We have fixed it in the README file. If you go through the example usage, the "f" tensor it returns should be a tensor of size torch.Size([1, 31, 768]), where 1 is the batch size, 31 is the time step (Since the window size and stride size of the CNN feature extractors are 400 and 320 respectively, the time step is (10000 - 80) / 320 = 31), and 768 is the hidden state dimension.
Hi @AhmedHashish123 ,
Thank you for the interest and the question! We accidentally use the wrong feature extract function in our example usage of loading pretrained models. We have fixed it in the README file. If you go through the example usage, the "f" tensor it returns should be a tensor of size torch.Size([1, 31, 768]), where 1 is the batch size, 31 is the time step (Since the window size and stride size of the CNN feature extractors are 400 and 320 respectively, the time step is (10000 - 80) / 320 = 31), and 768 is the hidden state dimension.
Thank you for making it clear
When I try to run the example in UniSpeech-SAT directory in this repo, I get 'f' as a tensor of size torch.Size([1, 512, 31]). What exactly does the variable f represent?