Open danilojsl opened 3 years ago
Thanks @danilojsl , can you also share a sample of the Python script that gives you different result?
I think this is because you've set all the weights to zero, so when it goes through the LSTM equation then everything is multiplied by a zero vector. Try setting some of the weights to random values, or ones. Keras initialises all the weights with glorot_uniform
, we've got that too, so you could try to use that.
Hi, @karllessard. Sure, here is the Python script.
I assume LSTM Layer in Keras does many more things under the hood. So, I will follow @Craigacp suggestion to initialize all weights with glorot_uniform
and let you know how it goes.
Thanks for your fast replies
I was able to get reasonable values for BlockLSTM
output. Since I require to work with TensorFlow Java Version 0.2.0 I used TruncatedNormal
as an initializer for the weights.
Now, I'm a little confused because there is also a BlockLSTMGrad
class that according to the documentation should be used in conjunction with BlockLSTM
.
As much as I understand BlockLSTM
output will be the input of BlockLSTMGrad
, but it requires two arguments that I don't know how to get from:
csGrad – The current gradient of cs.
hGrad – The gradient of h vector
Could you please guide me on how I can compute those values?
Do I need to use both BlockLSTM
and BlockLSTMGrad
to have a behavior similar to Keras LSTM?
I don't think that operation is meant to be used directly, it's added to graphs when you apply gradients to them via an Optimizer
.
So, only with BlockLSTM
will be enough to have a behavior similar to Keras LSTM as in this Python script?
Probably not, the LSTM in Keras is quite complicated and only calls the op we expose at the very end. I've not traced through the computation properly to see what preprocessing we'd need to add to make an equivalent of the Keras LSTM layer. We're still working on building up higher level functionality like that in the Java API, and recurrent nets haven't been started yet.
I see, thanks for the help anyways. It would be amazing to have that feature. I'm a contributor to Spark NLP API and we plan to port as much as possible our TF operations from Python to Java/Scala. We want to use this tensorflow-java library looking forward to having our TF models entirely created with Java. Starting with our release 3.0.0, we added tensorflow-java as our dependency. I checked that there is a Feature Request category when creating a new issue. It would be ok if I open a new feature request for this LSTM-RNN layer?
Yes sure please do, that would be something interesting to look at!
CCing @JimClarke5 which is our expert in porting Keras code to Java
@danilojsl HI ,I want to know does your team had make the LSTM Layer to real with tensorflow-java in spark_nlp project. I also develop one project like keras-scala, want to make the rnn lstm gru transformer Layer to real use tensorflow-java
Hi @mullerhai, no in spark-nlp we haven't implemented that feature and won't do it in the foreseeable future. We are still waiting for contributors of tensorflow-java to make this available.
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
Describe the current behavior I'm using tensorflow-java as a dependency for my project. I wanted to use BlockLSTM feature but the output returns always the same values. e.g. cell state output always has 0 values.
Describe the expected behavior I expected to see results similar to LSTM TF Keras layer from Python. In which, each run returns different values e.g. cell state outputs run 1: [ 0.12028465, 0.07415504, -0.09205371, -0.14372592, 0.00117318] run 2: [ 0.07089745, -0.02260131, -0.00052543, -0.19030134, 0.14710784]
Code to reproduce the issue You can check the code I wrote on this public repo: https://github.com/danilojsl/tensorflow-java-spikes/blob/main/src/main/java/LSTMSpike.java
Other info / logs Warning: Could not load Loader: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path Warning: Could not load Pointer: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path Warning: Could not load BytePointer: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path 2021-04-01 12:46:24.406136: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Warning: Could not load IntPointer: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path Warning: Could not load PointerPointer: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path 2021-04-01 12:46:24.522141: W external/org_tensorflow/tensorflow/core/kernels/rnn/lstm_ops.cc:869] BlockLSTMOp is inefficient when both batch_size and cell_size are odd. You are using: batch_size=1, cell_size=5 Input Gate: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] Cell State: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] Forget State: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] Output Gate: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] Cell Input: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] Cell Output: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] Hidden Output: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]