Closed seanpmorgan closed 5 years ago
@yifeif @goldiegadde -- any clues here?
I don't think this is related to #130 but thought but thought id mention it so we can keep track of issues we're having linking with tf2-nightly
Pretty sure this is related to: https://github.com/tensorflow/tensorflow/issues/27067
SetShapeFn has issues when compiled with gcc5. I have since rebuilt 0.2.1 using gcc4.8 so the original issue is likely solved, but there is a new issue explained below.
Marking this as blocked because tf2-nightly and tf2-alpha have different shared library names so our builds wouldn't work on both of them regardless.
tensorflow.python.framework.errors_impl.NotFoundError: libtensorflow_framework.so: cannot open shared object file: No such file or directory
Described in #130
Is there any workaround to get tensorflow addons working with tf2-nightly?
Build from source would be the only way for now (tip: use the docker image recommended in the contributing guide). As soon as we can get a tf2-beta version that includes the new so versioning then it'll be compatible with both again.
Just a note though: there was a breaking change last night which will be fixed once #135 gets merged
Alternatively we could setup a nightly addons package which would work. But that requires #76 to be finished
Build from source would be the only way for now (tip: use the docker image recommended in the contributing guide). As soon as we can get a tf2-beta version that includes the new so versioning then it'll be compatible with both again.
Just a note though: there was a breaking change last night which will be fixed once #135 gets merged
@seanpmorgan I was trying to build from sources with the dockerfile, but it is building Python v2.7, how do I change this to Python 3.6? I started editing files, but it was so many changes I think I was doing it wrong.
Ah yeah, apologies we have no automated build script yet for you to see an example (should soon though).
You can install python3.6 and set your paths appropriately before building. Alternatively you can just install Miniconda and create a new env. By switching with conda activate
it'll set the paths correctly and you can build as py36
Ah yeah, apologies we have no automated build script yet for you to see an example (should soon though).
You can install python3.6 and set your paths appropriately before building. Alternatively you can just install Miniconda and create a new env. By switching with
conda activate
it'll set the paths correctly and you can build as py36
@seanpmorgan I thought this was going to be simpler, but after trying many things I'm now stuck :-(. This is what I did. It is building, but I get an error: "tensorflow.python.framework.errors_impl.NotFoundError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory"
$ git clone https://github.com/tensorflow/addons.git $ cd addons
$ docker run --rm -it -v ${PWD}:/addons -w /addons tensorflow/tensorflow:nightly-custom-op /bin/bash
$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh $ chmod +x miniconda.sh $ ./miniconda.sh (using path => /addons/miniconda/) $ export PATH="/addons/miniconda/bin:$PATH" $ conda create -n py36 python=3.6 $ source activate py36
$ ./configure.sh $ bazel build build_pip_pkg this fails to build, so I edited the generated file on line 3799 vi /addons/bazel-addons/external/local_config_tf/BUILD
from: from genrule( name = "libtensorflow_framework.so.1", outs = [ "libtensorflow_framework.so.1, ], cmd = """ cp -f "/usr/local/lib/python3.6/dist-packages/tensorflow/libtensorflow_framework.so.2" "$(@D)/libtensorflow_framework.so.1" """, )
to
genrule( name = "libtensorflow_framework.so.1", outs = [ "libtensorflow_framework.so.2, ], cmd = """ cp -f "/usr/local/lib/python3.6/dist-packages/tensorflow/libtensorflow_framework.so.2" "$(@D)/libtensorflow_framework.so.2" """, )
running again and it now works $ bazel build build_pip_pkg $ bazel-bin/build_pip_pkg artifacts => success!!
$ exit (to leave docker) $ cd artifacts $ pip install --upgrade --force-reinstall tensorflow_addons-0.2.0.dev0-cp36-cp36m-linux_x86_64.whl
when i try: import tensorflow as tf import tensorflow_addons as tfa
the fatal error: tensorflow.python.framework.errors_impl.NotFoundError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory
@alew3 sorry for the troublesome experience, we're making some daily breaking changes against nightly. Can you check the daily version of the tf2-alpha you install? yesterday the framework was named so.1
It should be 20190403: https://pypi.org/project/tf-nightly-2.0-preview/#history
@seanpmorgan thanks for your help! I just installed the latest tf-nightly and now it doesn't even load tensorflow, so I guess things are a bit turbulent at the moment .. :-) I'm going to revert to an older version.
Closing this issue as there is a lot of info that is not relevant anymore. Will re-make shortly.
Is there any updates on this issue? custom ops built with gcc 7.4 seems to see this issue as well. It sees runtime error when loading custom ops built with gcc 7.4 (tf 1.14.0 built with gcc 7.4 too):
...
File "~/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: ~/custom_ops/voxelization.so: undefined symbol: _ZN10tensorflow12OpDefBuilder10SetShapeFnEPFNS_6StatusEPNS_15shape_inference16InferenceContextEE
Same runtime issue when build from source on v1.15.2
tag with no custom op.
gcc 7.5.0
Traceback (most recent call last):
File "BERT_NER.py", line 625, in <module>
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "BERT_NER.py", line 505, in main
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/util/lazy_loader.py", line 63, in __getattr__
return getattr(module, item)
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__
module = self._load()
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load
module = _importlib.import_module(self.__name__)
File "/opt/miniconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/__init__.py", line 54, in <module>
from tensorflow.contrib import image
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/__init__.py", line 55, in <module>
from tensorflow.contrib.image.python.ops.dense_image_warp import dense_image_warp
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/__init__.py", line 57, in <module>
from tensorflow.contrib.image.python.ops.distort_image_ops import adjust_hsv_in_yiq
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/python/ops/distort_image_ops.py", line 29, in <module>
resource_loader.get_path_to_datafile('_distort_image_ops.so'))
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/util/loader.py", line 56, in load_op_library
ret = load_library.load_op_library(path)
File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/python/ops/_distort_image_ops.so: undefined symbol: _ZN4absl5Mutex10ReaderLockEv
Same runtime issue when build from source on
v1.15.2
tag with no custom op. gcc 7.5.0Traceback (most recent call last): File "BERT_NER.py", line 625, in <module> File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "BERT_NER.py", line 505, in main File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/util/lazy_loader.py", line 63, in __getattr__ return getattr(module, item) File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__ module = self._load() File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load module = _importlib.import_module(self.__name__) File "/opt/miniconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1006, in _gcd_import File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 677, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 728, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/__init__.py", line 54, in <module> from tensorflow.contrib import image File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/__init__.py", line 55, in <module> from tensorflow.contrib.image.python.ops.dense_image_warp import dense_image_warp File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/__init__.py", line 57, in <module> from tensorflow.contrib.image.python.ops.distort_image_ops import adjust_hsv_in_yiq File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/python/ops/distort_image_ops.py", line 29, in <module> resource_loader.get_path_to_datafile('_distort_image_ops.so')) File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/util/loader.py", line 56, in load_op_library ret = load_library.load_op_library(path) File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError: /datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/python/ops/_distort_image_ops.so: undefined symbol: _ZN4absl5Mutex10ReaderLockEv
Hi @YanzheL. TF-Addons is only compatible with TF2.X and so building against 1.15.2 is unsupported. It looks as though you're getting this symbol mismatch on tf.contrib so you may want to raise this issue on the tensorflow/tensorflow repo. Which part of addons were you trying to use?
While packaging 0.2.0 release of addons, I realized that if the pip package was built against
tensorflow==2.0.0-alpha0
the installed package would not correctly work on the tf2-nightly install.The error is:
Digging in further I thought that something may have gone awry in the build for this version, but when I checked a previous version (0.1.1) I noticed it also failed to load on the current nightly. Typically I've seen this type of error when using gcc>=5 but the solution provided is to set
D_GLIBCXX_USE_CXX11_ABI=0
which we doSome things to note in the investigation:
Has there been a recent change in how tf2-nightly is packaged compared to tf2-alpha? Going forward we can mention that addons should be used with tf2-alpha, and just use nightly for testing... but there seems to be an underlying issue that needs to be fixed.
Colab Notebooks:
Addons-0.1.1 + Alpha Addons-0.1.1 + Nightly Addons-0.2.0 + Alpha Addons-0.2.0 + Nightly
cc @gunan @yifeif