Segmentation fault USE-multilingual TF2.0

brenting commented 5 years ago

I am getting a segmentation fault like in #345.

Debian 10 (Buster)
Python 3.6/3.7

tensorboard==2.0.1
tensorflow==2.0.0
tensorflow-estimator==2.0.1
tensorflow-hub==0.7.0
sentencepiece==0.1.84
tf-sentencepiece==0.1.84

It happens when I do:

import faulthandler
faulthandler.enable()
import tensorflow as tf
import tensorflow_hub as hub
import tf_sentencepiece

embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/1")

The output is as follows:

Fatal Python error: Segmentation fault

Current thread 0x00007fd05a90f740 (most recent call first):
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 501 in _import_graph_def_internal
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 405 in import_graph_def
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507 in new_func
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809 in import_scoped_meta_graph_with_return_elements
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1477 in _import_meta_graph_with_return_elements
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/saved_model/load_v1_in_v2.py", line 89 in load_graph
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/eager/wrap_function.py", line 89 in wrapped
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/eager/wrap_function.py", line 83 in __call__
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/framework/func_graph.py", line 915 in func_graph_from_py_func
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/eager/wrap_function.py", line 598 in wrap_function
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/saved_model/load_v1_in_v2.py", line 189 in load
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/saved_model/load_v1_in_v2.py", line 239 in load
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/saved_model/load.py", line 548 in load_internal
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_core/python/saved_model/load.py", line 517 in load
  File "<project>/.venv/lib/python3.7/site-packages/tensorflow_hub/module_v2.py", line 95 in load
  File "run.py", line 7 in <module>
Segmentation fault

I have to use (tf-)sentencepiece 0.1.84 due to a bug in 0.1.83 addressed in https://github.com/google/sentencepiece/pull/417. (tf-)sentencepiece 0.1.84 is not available yet in pip, but this problem can still be reproduced in Google Colab via a workaround. The following code will make the Google Colab kernel crash:

!wget -q https://github.com/google/sentencepiece/releases/download/v0.1.84/tf_sentencepiece-0.1.84-py2.py3-none-manylinux1_x86_64.whl
!wget -q https://github.com/google/sentencepiece/releases/download/v0.1.84/sentencepiece-0.1.84-cp36-cp36m-manylinux1_x86_64.whl

%tensorflow_version 2.x
!pip3 install --quiet tensorflow-hub
!pip3 install --quiet sentencepiece-0.1.84-cp36-cp36m-manylinux1_x86_64.whl
!pip3 install --quiet tf_sentencepiece-0.1.84-py2.py3-none-manylinux1_x86_64.whl

import tensorflow as tf
import tensorflow_hub as hub
import tf_sentencepiece

embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/1")

rmothukuru commented 5 years ago

Could reproduce the issue in Google Colab. Kernel Crashes with the above code. Here is the Github_Gist_Of_Colab

brenting commented 5 years ago

The crash does not occur with the new USE-multilingual model: https://tfhub.dev/google/universal-sentence-encoder-multilingual/2

arnoegw commented 5 years ago

Thanks, @brenting , for the update! I guess that is the preferred solution then:

For TF2.0, use the model https://tfhub.dev/google/universal-sentence-encoder-multilingual/2 in the new, native TF2 SavedModel format together with sentencepiece==0.1.84 and tf-sentencepiece==0.1.84 (for now with the manual download you describe).
For TF1.14 (and presumably TF1.15 in its native TF1 mode), use the resolution of issue https://github.com/tensorflow/hub/issues/345: model https://tfhub.dev/google/universal-sentence-encoder-multilingual/1 with tf-sentencepiece>=0.1.83 and sentencepiece>=0.1.83.

It's a bit unfortunate that all these versions need to be aligned, and that there are no useful error messages, but that isn't something we can fix in the TF Hub code.

Adding vbardiovskyg; please close if you see nothing else.

brenting commented 5 years ago

According to the documentation accompanying the new USE-multilingual-2 model, the model lost its dependency on sentencepiece and tf-sentencepiece and now requires tensorflow_text>=2.0.0rc0.

For the older versions of TF, due to the bug in https://github.com/google/sentencepiece/pull/417, it is likely that only the exact TF versions found on the tf-sentencepiece v0.1.83 page work for USE-multilingual-1. This will be the case until the fix is implemented in older packages of (tf-)sentencepiece.

Long story short. For now:

For TF2.0, use the USE-multilingual-2 model with tensorflow_text>=2.0.0rc0 and tensorflow-hub as described in the documentation.
For older versions of TF, install sentencepiece==0.1.83 and tf-sentencepiece==0.1.83 pick one of TF versions found on the tf-sentencepiece v0.1.83 page and install that exact TF version. Then load the model via tensorflow-hub.

Also change ["output"] to ["outputs"] in the example on the USE-multilingual-2 page.

PS: Although tf-sentencepiece==0.1.84 is available, tf-sentencepiece==0.1.83 is still the default version in pip. USE-multilingual-1 with tf-sentencepiece v0.1.84 potentially only works for TF1.13.1, TF1.14.0 and TF2.0.0 until the previously described fix is implemented (or fixed by hand).

PPS: Save yourself some trouble and go for USE-multilingual-2 with TF2.0 :).

jaxlaw commented 4 years ago

https://tfhub.dev/google/universal-sentence-encoder-multilingual/3 now supports TF2.0 and uses tensorflow_text instead of tf_sentencepiece.

arnoegw commented 4 years ago

Nice.

With that, can we close this issue?

brenting commented 4 years ago

The crash does not occur with the new USE-multilingual model: https://tfhub.dev/google/universal-sentence-encoder-multilingual/2

The issue was already solved when USE-multilingual-2 came out, which supports TF2.0 and uses tensorflow_text. If you do not intend to add TF2.0 support to USE-multilingual-1 in retrospect, then this issue can indeed be closed.

arnoegw commented 4 years ago

Thanks for confirming.

Module version 1 won't be changed, because TF Hub module contents are immutable (to avoid consistency issues with caching etc.).

tensorflow / hub

Segmentation fault USE-multilingual TF2.0 #404