tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Apache License 2.0
1.69k stars 613 forks source link

Packaged Addons don't run on tf-nightly-2.0 #112

Closed seanpmorgan closed 5 years ago

seanpmorgan commented 5 years ago

While packaging 0.2.0 release of addons, I realized that if the pip package was built against tensorflow==2.0.0-alpha0 the installed package would not correctly work on the tf2-nightly install.

The error is:

NotFoundError: /usr/local/lib/python3.6/dist-packages/tensorflow_addons/custom_ops/image/python/_distort_image_ops.so: undefined symbol: _ZN10tensorflow12OpDefBuilder10SetShapeFnEPFNS_6StatusEPNS_15shape_inference16InferenceContextEE

Digging in further I thought that something may have gone awry in the build for this version, but when I checked a previous version (0.1.1) I noticed it also failed to load on the current nightly. Typically I've seen this type of error when using gcc>=5 but the solution provided is to set D_GLIBCXX_USE_CXX11_ABI=0 which we do

Some things to note in the investigation:

Has there been a recent change in how tf2-nightly is packaged compared to tf2-alpha? Going forward we can mention that addons should be used with tf2-alpha, and just use nightly for testing... but there seems to be an underlying issue that needs to be fixed.

Colab Notebooks:

Addons-0.1.1 + Alpha Addons-0.1.1 + Nightly Addons-0.2.0 + Alpha Addons-0.2.0 + Nightly

cc @gunan @yifeif

karmel commented 5 years ago

@yifeif @goldiegadde -- any clues here?

seanpmorgan commented 5 years ago

I don't think this is related to #130 but thought but thought id mention it so we can keep track of issues we're having linking with tf2-nightly

seanpmorgan commented 5 years ago

Pretty sure this is related to: https://github.com/tensorflow/tensorflow/issues/27067

SetShapeFn has issues when compiled with gcc5. I have since rebuilt 0.2.1 using gcc4.8 so the original issue is likely solved, but there is a new issue explained below.

seanpmorgan commented 5 years ago

Marking this as blocked because tf2-nightly and tf2-alpha have different shared library names so our builds wouldn't work on both of them regardless.

tensorflow.python.framework.errors_impl.NotFoundError: libtensorflow_framework.so: cannot open shared object file: No such file or directory

Described in #130

alew3 commented 5 years ago

Is there any workaround to get tensorflow addons working with tf2-nightly?

seanpmorgan commented 5 years ago

Build from source would be the only way for now (tip: use the docker image recommended in the contributing guide). As soon as we can get a tf2-beta version that includes the new so versioning then it'll be compatible with both again.

Just a note though: there was a breaking change last night which will be fixed once #135 gets merged

seanpmorgan commented 5 years ago

Alternatively we could setup a nightly addons package which would work. But that requires #76 to be finished

alew3 commented 5 years ago

Build from source would be the only way for now (tip: use the docker image recommended in the contributing guide). As soon as we can get a tf2-beta version that includes the new so versioning then it'll be compatible with both again.

Just a note though: there was a breaking change last night which will be fixed once #135 gets merged

@seanpmorgan I was trying to build from sources with the dockerfile, but it is building Python v2.7, how do I change this to Python 3.6? I started editing files, but it was so many changes I think I was doing it wrong.

seanpmorgan commented 5 years ago

Ah yeah, apologies we have no automated build script yet for you to see an example (should soon though).

You can install python3.6 and set your paths appropriately before building. Alternatively you can just install Miniconda and create a new env. By switching with conda activate it'll set the paths correctly and you can build as py36

alew3 commented 5 years ago

Ah yeah, apologies we have no automated build script yet for you to see an example (should soon though).

You can install python3.6 and set your paths appropriately before building. Alternatively you can just install Miniconda and create a new env. By switching with conda activate it'll set the paths correctly and you can build as py36

@seanpmorgan I thought this was going to be simpler, but after trying many things I'm now stuck :-(. This is what I did. It is building, but I get an error: "tensorflow.python.framework.errors_impl.NotFoundError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory"

$ git clone https://github.com/tensorflow/addons.git $ cd addons

$ docker run --rm -it -v ${PWD}:/addons -w /addons tensorflow/tensorflow:nightly-custom-op /bin/bash

$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh $ chmod +x miniconda.sh $ ./miniconda.sh (using path => /addons/miniconda/) $ export PATH="/addons/miniconda/bin:$PATH" $ conda create -n py36 python=3.6 $ source activate py36

$ ./configure.sh $ bazel build build_pip_pkg this fails to build, so I edited the generated file on line 3799 vi /addons/bazel-addons/external/local_config_tf/BUILD

from: from genrule( name = "libtensorflow_framework.so.1", outs = [ "libtensorflow_framework.so.1, ], cmd = """ cp -f "/usr/local/lib/python3.6/dist-packages/tensorflow/libtensorflow_framework.so.2" "$(@D)/libtensorflow_framework.so.1" """, )

to

genrule( name = "libtensorflow_framework.so.1", outs = [ "libtensorflow_framework.so.2, ], cmd = """ cp -f "/usr/local/lib/python3.6/dist-packages/tensorflow/libtensorflow_framework.so.2" "$(@D)/libtensorflow_framework.so.2" """, )

running again and it now works $ bazel build build_pip_pkg $ bazel-bin/build_pip_pkg artifacts => success!!

$ exit (to leave docker) $ cd artifacts $ pip install --upgrade --force-reinstall tensorflow_addons-0.2.0.dev0-cp36-cp36m-linux_x86_64.whl

when i try: import tensorflow as tf import tensorflow_addons as tfa

the fatal error: tensorflow.python.framework.errors_impl.NotFoundError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory

seanpmorgan commented 5 years ago

@alew3 sorry for the troublesome experience, we're making some daily breaking changes against nightly. Can you check the daily version of the tf2-alpha you install? yesterday the framework was named so.1

It should be 20190403: https://pypi.org/project/tf-nightly-2.0-preview/#history

alew3 commented 5 years ago

@seanpmorgan thanks for your help! I just installed the latest tf-nightly and now it doesn't even load tensorflow, so I guess things are a bit turbulent at the moment .. :-) I'm going to revert to an older version.

seanpmorgan commented 5 years ago

Closing this issue as there is a lot of info that is not relevant anymore. Will re-make shortly.

golden0080 commented 5 years ago

Is there any updates on this issue? custom ops built with gcc 7.4 seems to see this issue as well. It sees runtime error when loading custom ops built with gcc 7.4 (tf 1.14.0 built with gcc 7.4 too):

...
  File "~/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: ~/custom_ops/voxelization.so: undefined symbol: _ZN10tensorflow12OpDefBuilder10SetShapeFnEPFNS_6StatusEPNS_15shape_inference16InferenceContextEE
YanzheL commented 4 years ago

Same runtime issue when build from source on v1.15.2 tag with no custom op. gcc 7.5.0

Traceback (most recent call last):
  File "BERT_NER.py", line 625, in <module>
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "BERT_NER.py", line 505, in main
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/util/lazy_loader.py", line 63, in __getattr__
    return getattr(module, item)
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/opt/miniconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/__init__.py", line 54, in <module>
    from tensorflow.contrib import image
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/__init__.py", line 55, in <module>
    from tensorflow.contrib.image.python.ops.dense_image_warp import dense_image_warp
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/__init__.py", line 57, in <module>
    from tensorflow.contrib.image.python.ops.distort_image_ops import adjust_hsv_in_yiq
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/python/ops/distort_image_ops.py", line 29, in <module>
    resource_loader.get_path_to_datafile('_distort_image_ops.so'))
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/util/loader.py", line 56, in load_op_library
    ret = load_library.load_op_library(path)
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/python/ops/_distort_image_ops.so: undefined symbol: _ZN4absl5Mutex10ReaderLockEv
seanpmorgan commented 4 years ago

Same runtime issue when build from source on v1.15.2 tag with no custom op. gcc 7.5.0

Traceback (most recent call last):
  File "BERT_NER.py", line 625, in <module>
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "BERT_NER.py", line 505, in main
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/util/lazy_loader.py", line 63, in __getattr__
    return getattr(module, item)
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/opt/miniconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/__init__.py", line 54, in <module>
    from tensorflow.contrib import image
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/__init__.py", line 55, in <module>
    from tensorflow.contrib.image.python.ops.dense_image_warp import dense_image_warp
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/__init__.py", line 57, in <module>
    from tensorflow.contrib.image.python.ops.distort_image_ops import adjust_hsv_in_yiq
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/python/ops/distort_image_ops.py", line 29, in <module>
    resource_loader.get_path_to_datafile('_distort_image_ops.so'))
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/util/loader.py", line 56, in load_op_library
    ret = load_library.load_op_library(path)
  File "/datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /datastore/alt-rootfs/home/zhanghanwen/ner_bert_celoss/venv/lib/python3.7/site-packages/tensorflow_core/contrib/image/python/ops/_distort_image_ops.so: undefined symbol: _ZN4absl5Mutex10ReaderLockEv

Hi @YanzheL. TF-Addons is only compatible with TF2.X and so building against 1.15.2 is unsupported. It looks as though you're getting this symbol mismatch on tf.contrib so you may want to raise this issue on the tensorflow/tensorflow repo. Which part of addons were you trying to use?