noahchalifour / rnnt-speech-recognition

End-to-end speech recognition using RNN Transducers in Tensorflow 2.0
MIT License
241 stars 78 forks source link

warp-transducer - Is rnnt_loss() "internal state" causing wrong loss computation? #36

Closed stefan-falk closed 3 years ago

stefan-falk commented 4 years ago

So, I have tried to implement the model more or less "from scratch" on the basis of this repository.

For that I have implemented a training loop which I am executing eagerly to be able to debug.

However, in doing so I noticed that my loss is just jumping around weirdly and I still haven't figured out exactly why. In order to get insight I intended to take a closer look towards the rnnt_loss() function. While I executed some simple text-examples, I noticed that calling rnnt_losss() repeatedly on the same input, the loss is always different. But not just that: It's monotonically increasing.

The code I am running:

from warprnnt_tensorflow import rnnt_loss
import numpy as np

def main():
    acts = np.asarray([
        [
            [[0.0, 0.0, 0.0],
             [0.0, 0.0, 0.0]],
            [[0.0, 0.0, 0.0],
             [0.0, 0.0, 0.0]],
            [[0.0, 0.0, 0.0],
             [0.0, 0.0, 0.0]],
        ]
    ])

    labels = np.asarray([[1, 2, 0]])
    label_lengths = [len(t) for t in labels]

    for i in range(10):
        loss = rnnt_loss(
            acts=acts,
            labels=labels,
            input_lengths=label_lengths,
            label_lengths=label_lengths
        )
        print(np.mean(loss))

if __name__ == '__main__':
    main()

Output:

1.0986123
2.1490226
5.274593
6.7222075
9.581686
11.274273
13.95323
15.808798
18.36151
20.329256

I am on tensorflow==2.2.0 and I compiled the warp-transducer with GPU support.

noahchalifour commented 3 years ago

@stefan-falk The acts dim 3 needs to be >= label_lengths as per https://github.com/HawkAaron/warp-transducer/issues/72