BlockLSTM returns 0 tensor values

danilojsl commented 3 years ago

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary):
TensorFlow version (use command below): 2.3.1
Python version: 3.6.9
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the current behavior I'm using tensorflow-java as a dependency for my project. I wanted to use BlockLSTM feature but the output returns always the same values. e.g. cell state output always has 0 values.

Describe the expected behavior I expected to see results similar to LSTM TF Keras layer from Python. In which, each run returns different values e.g. cell state outputs run 1: [ 0.12028465, 0.07415504, -0.09205371, -0.14372592, 0.00117318] run 2: [ 0.07089745, -0.02260131, -0.00052543, -0.19030134, 0.14710784]

Code to reproduce the issue You can check the code I wrote on this public repo: https://github.com/danilojsl/tensorflow-java-spikes/blob/main/src/main/java/LSTMSpike.java

Other info / logs Warning: Could not load Loader: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path Warning: Could not load Pointer: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path Warning: Could not load BytePointer: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path 2021-04-01 12:46:24.406136: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Warning: Could not load IntPointer: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path Warning: Could not load PointerPointer: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path 2021-04-01 12:46:24.522141: W external/org_tensorflow/tensorflow/core/kernels/rnn/lstm_ops.cc:869] BlockLSTMOp is inefficient when both batch_size and cell_size are odd. You are using: batch_size=1, cell_size=5 Input Gate: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] Cell State: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] Forget State: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] Output Gate: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] Cell Input: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] Cell Output: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] Hidden Output: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

karllessard commented 3 years ago

Thanks @danilojsl , can you also share a sample of the Python script that gives you different result?

Craigacp commented 3 years ago

I think this is because you've set all the weights to zero, so when it goes through the LSTM equation then everything is multiplied by a zero vector. Try setting some of the weights to random values, or ones. Keras initialises all the weights with glorot_uniform, we've got that too, so you could try to use that.

danilojsl commented 3 years ago

Hi, @karllessard. Sure, here is the Python script.

I assume LSTM Layer in Keras does many more things under the hood. So, I will follow @Craigacp suggestion to initialize all weights with glorot_uniform and let you know how it goes.

Thanks for your fast replies

danilojsl commented 3 years ago

I was able to get reasonable values for BlockLSTM output. Since I require to work with TensorFlow Java Version 0.2.0 I used TruncatedNormal as an initializer for the weights.

Now, I'm a little confused because there is also a BlockLSTMGrad class that according to the documentation should be used in conjunction with BlockLSTM. As much as I understand BlockLSTM output will be the input of BlockLSTMGrad, but it requires two arguments that I don't know how to get from:

csGrad – The current gradient of cs.
hGrad – The gradient of h vector

Could you please guide me on how I can compute those values?

Do I need to use both BlockLSTM and BlockLSTMGrad to have a behavior similar to Keras LSTM?

Craigacp commented 3 years ago

I don't think that operation is meant to be used directly, it's added to graphs when you apply gradients to them via an Optimizer.

danilojsl commented 3 years ago

So, only with BlockLSTM will be enough to have a behavior similar to Keras LSTM as in this Python script?

Craigacp commented 3 years ago

Probably not, the LSTM in Keras is quite complicated and only calls the op we expose at the very end. I've not traced through the computation properly to see what preprocessing we'd need to add to make an equivalent of the Keras LSTM layer. We're still working on building up higher level functionality like that in the Java API, and recurrent nets haven't been started yet.

danilojsl commented 3 years ago

I see, thanks for the help anyways. It would be amazing to have that feature. I'm a contributor to Spark NLP API and we plan to port as much as possible our TF operations from Python to Java/Scala. We want to use this tensorflow-java library looking forward to having our TF models entirely created with Java. Starting with our release 3.0.0, we added tensorflow-java as our dependency. I checked that there is a Feature Request category when creating a new issue. It would be ok if I open a new feature request for this LSTM-RNN layer?

karllessard commented 3 years ago

Yes sure please do, that would be something interesting to look at!

CCing @JimClarke5 which is our expert in porting Keras code to Java

mullerhai commented 2 years ago

@danilojsl HI ,I want to know does your team had make the LSTM Layer to real with tensorflow-java in spark_nlp project. I also develop one project like keras-scala, want to make the rnn lstm gru transformer Layer to real use tensorflow-java

danilojsl commented 2 years ago

Hi @mullerhai, no in spark-nlp we haven't implemented that feature and won't do it in the foreseeable future. We are still waiting for contributors of tensorflow-java to make this available.

tensorflow / java

BlockLSTM returns 0 tensor values #270