During the conversion, LSTM bias is of dim [8] will be converted to [1, 16]. In this case wb [1,8] and rb[1,8] are initialized to zeros (if B is not passed) else B is assigned to wb.
Say, if I am doing some custom training on the ONNX model and generating the gradients for variables. I can map gradients back to Keras model with minor conversions (iofc to ifco etc). But how can I map the B's gradients? As they will split over the training and averaging the gradients of rb and wb is not same as the TF bias gradient for seq_len > 1.
I know it is a special case, just want to hear your thoughts here.
During the conversion, LSTM bias is of dim [8] will be converted to [1, 16]. In this case wb [1,8] and rb[1,8] are initialized to zeros (if B is not passed) else B is assigned to wb.
Say, if I am doing some custom training on the ONNX model and generating the gradients for variables. I can map gradients back to Keras model with minor conversions (iofc to ifco etc). But how can I map the B's gradients? As they will split over the training and averaging the gradients of rb and wb is not same as the TF bias gradient for seq_len > 1.
I know it is a special case, just want to hear your thoughts here.