Closed adityatb closed 1 year ago
Hello, Thanks for the issue. We somehow missed it until now. We will look into it.
Happy to answer your question, I ran into a similar tensor mismatch when testing. What you have here is the problem of SFTF and BATCH_SIZE. You try to change the HOP_Length parameter in the config.py file, reduce the frame spacing, to match the amplitude spectrum of 64 width, try it, mine has worked.
same issue
@my-yy please refer to @AAA-eng-alt comment, it is the answer to your issue.
Hi there,
Excited to try this repo! I was trying to get this to run and found that there's seems to be some inconsistency with the pretrained model and it's expected input shape.
I tried to
librosa.load(audio_file, target_sr=16000)
and then run themain.embed()
on it as described in the ReadMe and I get an error with Tensor Shapes not matching.Perhaps I'm missing some extra steps to get you there. Appreciate any help you can offer!
Cheers!
I should clarify - I'm running it with the example files provided. There also seems to be a mismatch for the message shapes -> I get (1, 512) vs (16, 2, 512)