`transformer_asr.py`: incorrect `source_maxlen`

MicahDoo commented 1 year ago

https://github.com/keras-team/keras-io/blob/master/examples/audio/transformer_asr.py

In the code at the above link, I found that source_maxlen is defaulted to 100 in the transformer. The problem, though, is that the inputs are actually padded to length 2754, where it's then downsampled with CNN by a factor of 8. The result is a sequence of length 345, which is far greater than 2754. Correct me if I am wrong, but I reckon that is a bug?

Problem code:

In the transformer definition, source_maxlen is defaulted to 100:

class Transformer(keras.Model):
    def __init__(
        self,
        num_hid=64,
        num_head=2,
        num_feed_forward=128,
        source_maxlen=100,
        target_maxlen=100,
        num_layers_enc=4,
        num_layers_dec=1,
        num_classes=10,
    ):

... which isn't explicitly set at instantiation:

model = Transformer(
    num_hid=200,
    num_head=2,
    num_feed_forward=400,
    target_maxlen=max_target_len,
    num_layers_enc=4,
    num_layers_dec=1,
    num_classes=34,
)

sachinprasadhs commented 1 year ago

@apoorvnandan, would you be able to help in the above issue, related to your published tutorial here https://keras.io/examples/audio/transformer_asr/

apoorvnandan commented 1 year ago

Hi! Just saw this.

On a cursory glance, it does look like a bug.

The param name is misleading. That param is used to determine the input_dim in the Embedding layer. which should be something liek 129. (which comes from the stft of audio)
I think it should be explicity set to the above value instead of letting it default.

I'm not a 100% sure though. Will try to go through this after work to check if there is something I missed.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.

keras-team / keras-io

`transformer_asr.py`: incorrect `source_maxlen` #1389

Problem code: