keras-team / keras-io

Keras documentation, hosted live at keras.io
Apache License 2.0
2.75k stars 2.03k forks source link

`transformer_asr.py`: incorrect `source_maxlen` #1389

Open MicahDoo opened 1 year ago

MicahDoo commented 1 year ago

https://github.com/keras-team/keras-io/blob/master/examples/audio/transformer_asr.py

In the code at the above link, I found that source_maxlen is defaulted to 100 in the transformer. The problem, though, is that the inputs are actually padded to length 2754, where it's then downsampled with CNN by a factor of 8. The result is a sequence of length 345, which is far greater than 2754. Correct me if I am wrong, but I reckon that is a bug?

Problem code:

In the transformer definition, source_maxlen is defaulted to 100:

class Transformer(keras.Model):
    def __init__(
        self,
        num_hid=64,
        num_head=2,
        num_feed_forward=128,
        source_maxlen=100,
        target_maxlen=100,
        num_layers_enc=4,
        num_layers_dec=1,
        num_classes=10,
    ):

... which isn't explicitly set at instantiation:

model = Transformer(
    num_hid=200,
    num_head=2,
    num_feed_forward=400,
    target_maxlen=max_target_len,
    num_layers_enc=4,
    num_layers_dec=1,
    num_classes=34,
)
sachinprasadhs commented 1 year ago

@apoorvnandan, would you be able to help in the above issue, related to your published tutorial here https://keras.io/examples/audio/transformer_asr/

apoorvnandan commented 1 year ago

Hi! Just saw this.

On a cursory glance, it does look like a bug.

  1. The param name is misleading. That param is used to determine the input_dim in the Embedding layer. which should be something liek 129. (which comes from the stft of audio)
  2. I think it should be explicity set to the above value instead of letting it default.

I'm not a 100% sure though. Will try to go through this after work to check if there is something I missed.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.