Open MicahDoo opened 1 year ago
@apoorvnandan, would you be able to help in the above issue, related to your published tutorial here https://keras.io/examples/audio/transformer_asr/
Hi! Just saw this.
On a cursory glance, it does look like a bug.
input_dim
in the Embedding
layer. which should be something liek 129. (which comes from the stft of audio)I'm not a 100% sure though. Will try to go through this after work to check if there is something I missed.
This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.
https://github.com/keras-team/keras-io/blob/master/examples/audio/transformer_asr.py
In the code at the above link, I found that
source_maxlen
is defaulted to100
in the transformer. The problem, though, is that the inputs are actually padded to length 2754, where it's then downsampled with CNN by a factor of 8. The result is a sequence of length 345, which is far greater than 2754. Correct me if I am wrong, but I reckon that is a bug?Problem code:
In the transformer definition,
source_maxlen
is defaulted to 100:... which isn't explicitly set at instantiation: