what the use of pred_grads?

Hi, I have studied the source code, but the matrix operations are used so weird that really confuse me. Maybe you can kindly give a good literature about the algorithm with such strange diagonal matrix for us to understand. Now come to the problem, in the example: pred_loss, pred_grads = loss_grad_gradtape(logits, labels, label_lengths, logit_lengths) Is the pred_loss for tensorflow model loss function? what the use of pred_grads?

And when I check the source code, find the loss loss = -final_state_probs and final_state_probs = beta[:, 0, 0]

the loss is get only from backward_dp() without connection with forward_dp(). So I think the pred_loss can't be used in tensorflow model simply. What's the correct training method for tensorflow, following is correct?

logits = some_deep_network(...)
pred_loss, pred_grads = loss_grad_gradtape(logits, labels, label_lengths, logit_lengths)
rnnt_model = tf.keras.Model(inputs=[logits, labels, label_lengths, logit_lengths], outputs=pred_loss)
rnnt_model.compile(optimizer='adam', loss=lambda y_true, y_pred: y_pred)
rnnt_model.fit(...)

iamjanvijay / rnnt

what the use of pred_grads? #5