Closed AyushiAggarwal closed 7 years ago
Got the exact failures. Found the commit where this change was made to tensorflow.
Slight ( very temporary and janky ) workaround, after you've git cloned models:
cd models/syntaxnet/tensorflow/tensorflow
git checkout 712fcfc6e364e6ca39cee3d988089e51f73d1e65
- which is before this commit
nano models/syntaxnet/tensorflow/tensorflow/core/platform/default/mutex.h
change this:
#include "nsync_cv.h"
#include "nsync_mu.h"
to this:
#include "../nsync/public/nsync_cv.h"
#include "../nsync/public/nsync_mu.h"
The workaround doesn't solve the problem completely, since the last test still fails for me.
But ParseyMcParseface ( /opt/tensorflow/syntaxnet/syntaxnet/demo.sh
) works at least for the purposes of my project.
/CC @ebrevdo @calberti
This is caused by a new feature, the variant op registrar, being called more than once. Perhaps framework/tensor.cc is being linked in or compiled multiple times?
I followed @charlesjohannisen 's instructions to set the head to the version before the errant commit( that you referenced above). This seems to have eliminated the error "type_name: tensorflow::Tensor already registered".
However, my bazel test now fails with a new error - ImportError: No module named autograd found.
Log file:
Console output: Executed 25 out of 25 tests: 17 tests pass and 8 fail locally
Screenshot of tests that failed is given below:
This is still an ops issue. Further ideas on how this can be handled?
@charlesjohannisen - Can you provide the list of commands that you executed after the fix to the models/syntaxnet/tensorflow/tensorflow/core/platform/default/mutex.h file? This is to compare my installation steps and try to understand the ImportError.
sudo apt install gfortran
then
sudo python -m pip install autograd
should fix the last issue. Hopefully getting you back on track.
@charlesjohannisen Yes, that solved the issue! Thanks!
Also had to install enum
and enum34
bazel test ran fine:
Executed 25 out of 25 tests: 25 tests pass
ParseyMcParseface demo.sh works as expected.
Awaiting bug fix and the corresponding modification to the official README.
Looks like this fix doesn't work after restructure of repo. Will it be fixed?
Same issue. And like @rekcahd said, the file fix doesn't work after the directory structure was changed. I guess within the last month
I am also experiencing this issue.
same issue here.
Same problem!
Same problem. Any updates on issue status?
I managed to build and test without errors. To do this I just commented CHECK_EQ(existing, nullptr) in: RegisterShapeFn, RegisterDecodeFn, RegisterUnaryOpFn, RegisterBinaryOpFn
Furthermore I had to install the python package: apt-get install graphviz libgraphviz-dev pip install pygraphviz --install-option="--include-path=/usr/include/graphviz" --install-option="--library-path=/usr/lib/graphviz/"
Strangely I hat to modify syntaxnet/dragnn/python/component.py In: def build_greedy_training(self, state, network_states): from: with tf.control_dependencies([tf.assert_equal(self.training_beam_size, 1)]): stride = state.current_batch_size self.training_beam_size to: val = tf.Print(self.training_beam_size, [ self.training_beam_size ], "Fix for access bug. Correct value: ") with tf.control_dependencies([tf.assert_equal(val, 1)]): stride = state.current_batch_size self.training_beam_size
Just adding the print command fixes the error. Without the print the value of self.training_beam_size seems to be 8 but is 1 in truth. The print convinces the system to use the correct vale. Very very strange. EDIT 1: Here is the diff-file: models_diff.txt
EDIT 2: Just a hint for the guys wanting to really fix the problem with CHECK_EQ() models/research/syntaxnet/tensorflow/tensorflow/core/framework/tensor.cc registers statically REGISTER_UNARY_VARIANT_DECODE_FUNCTION(Tensor, "tensorflow::Tensor"); models/research/syntaxnet/tensorflow/tensorflow/core/framework/variant_op_registry.cc registers statically REGISTER_VARIANT_SHAPE_TYPE(int); ... etc.
both codes are included in _pywrap_tensorflow_internal.so and at least partly in parser_ops.so During test _pywrap_tensorflow_internal.so is initialed by _PyImport_LoadDynamicModule and by tensorflow::LoadLibrary So the static code is run twice causing the error. As far as I can see it, commenting the CHECK_EQ does not cause any harm in this case, due to the nature of this registration. I think the solution would be moving the static code to a different location not to be executed twice or changing the bazel build scrips not to include the same code in two different shared libraries.
The problem with syntaxnet/dragnn/python/component.py seems to me to be to be a serious problem inside the tensorflow core or with the python to c++ connection. Happy to learn any better explanation.
With @charlesjohannisen and @GelRa's solution, all bazel tests are success for me on Ubuntu 16.04 now.
@calberti @andorardo @bogatyy @markomernick - Mind taking a look to make the appropriate fix?
I wonder if we can just remove the tensor decoder registration in TF proper. I'm not sure if it's used for anything other than testing right now. However it would be good to fix this in parsey as well.
On Fri, Nov 3, 2017 at 12:14 AM, Asim Shankar notifications@github.com wrote:
@calberti https://github.com/calberti @andorardo https://github.com/andorardo @bogatyy https://github.com/bogatyy @markomernick https://github.com/markomernick - Mind taking a look to make the appropriate fix?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/models/issues/2355#issuecomment-341634734, or mute the thread https://github.com/notifications/unsubscribe-auth/ABtimxx1h7tAyvoAhlNEFgd6lZTTnaNmks5syr1igaJpZM4PQKFc .
Thank you for reporting the issue in detail and looking at the possible workarounds. We are working on this.
Could you try again? While a longer-term solution is not ready, we have a fix out.
More specifically, synced the TF subrepo to a version before the registration mechanism. Right now, SyntaxNet should build with Bazel 0.5.4 (this is mentioned in the README as well).
Sorry @bogatyy but despite your updates, I get 25 local fails, suggest to reopen issue. (side-note, I am installing with Tensorflow 1.4 was it your case?)
When introspecting, main error source is dependency on autograd python package
cat /root/.cache/bazel/_bazel_root/3b4c7ccb85580bc382ce4a52e9580003/execroot/__main__/bazel-out/local-opt/testlogs/syntaxnet/util/resources_test/test.log
from autograd import core as ag_core ImportError: No module named autograd
Upon fixing with pip install autograd
succesfully installs the package and throws a name importe error
cat /root/.cache/bazel/_bazel_root/3b4c7ccb85580bc382ce4a52e9580003/execroot/__main__/bazel-out/local-opt/testlogs/syntaxnet/util/resources_test/test.log
from autograd import container_types ImportError: cannot import name container_types
Would you have any idea when possibly an actual fix would be out?
@SaintNazaire you need to install a compatible version of autograd, as explained in the README:
pip install autograd==1.1.13
Let me know if that works (also, again, make sure you have Bazel 0.5.4)
@bogatyy it works thank you very much.
Fully tested docker file for your reference.
Warning: requires at least 3840Mb RAM to build locally on 3 CPUs, e.g. on Windows 10 machine click on the task bar hidden icons > right click on Docker icon > settings. Building on Docker Hub is limited to 2Gb and will fail.
Great to hear, closing the issue then.
Same problem!
I managed to build and test without errors. To do this I just commented CHECK_EQ(existing, nullptr) in: RegisterShapeFn, RegisterDecodeFn, RegisterUnaryOpFn, RegisterBinaryOpFn
Furthermore I had to install the python package: apt-get install graphviz libgraphviz-dev pip install pygraphviz --install-option="--include-path=/usr/include/graphviz" --install-option="--library-path=/usr/lib/graphviz/"
Strangely I hat to modify syntaxnet/dragnn/python/component.py In: def build_greedy_training(self, state, network_states): from: with tf.control_dependencies([tf.assert_equal(self.training_beam_size, 1)]): stride = state.current_batch_size self.training_beam_size to: val = tf.Print(self.training_beam_size, [ self.training_beam_size ], "Fix for access bug. Correct value: ") with tf.control_dependencies([tf.assert_equal(val, 1)]): stride = state.current_batch_size self.training_beam_size
Just adding the print command fixes the error. Without the print the value of self.training_beam_size seems to be 8 but is 1 in truth. The print convinces the system to use the correct vale. Very very strange. EDIT 1: Here is the diff-file: models_diff.txt
EDIT 2: Just a hint for the guys wanting to really fix the problem with CHECK_EQ() models/research/syntaxnet/tensorflow/tensorflow/core/framework/tensor.cc registers statically REGISTER_UNARY_VARIANT_DECODE_FUNCTION(Tensor, "tensorflow::Tensor"); models/research/syntaxnet/tensorflow/tensorflow/core/framework/variant_op_registry.cc registers statically REGISTER_VARIANT_SHAPE_TYPE(int); ... etc.
both codes are included in _pywrap_tensorflow_internal.so and at least partly in parser_ops.so During test _pywrap_tensorflow_internal.so is initialed by _PyImport_LoadDynamicModule and by tensorflow::LoadLibrary So the static code is run twice causing the error. As far as I can see it, commenting the CHECK_EQ does not cause any harm in this case, due to the nature of this registration. I think the solution would be moving the static code to a different location not to be executed twice or changing the bazel build scrips not to include the same code in two different shared libraries.
The problem with syntaxnet/dragnn/python/component.py seems to me to be to be a serious problem inside the tensorflow core or with the python to c++ connection. Happy to learn any better explanation.
This fixed my problem, thanks!
System information
bazel test --linkopt=-lrt syntaxnet/... util/utf8/...
Problem:
I have tried installing Syntaxnet to run ParseyMcParseface on CentOS 6/7 and Ubuntu 14.04 LTS and was stuck with the same error(described below). I followed the instructions for manual installation of syntaxnet given at https://github.com/tensorflow/models/tree/master/syntaxnet and ran the bazel test using the following command:
bazel test --linkopt=-lrt syntaxnet/... util/utf8/...
Console output: Executed 25 out of 25 tests: 19 tests pass and 6 fail locally
Screenshot of tests that failed is given below:
Log File For each of the failures, the test.log file shows the following error: F external/org_tensorflow/tensorflow/core/framework/variant_op_registry.cc:79] Check failed: existing == nullptr (0x10a0f30 vs. nullptr)Unary VariantDecodeFn for type_name: tensorflow::Tensor already registered Aborted
Struggled with this for days now and any help/fix would be appreciated. Nothing found on the bazel stackoverflow channel.