tensorflow / text

Making text a first-class citizen in TensorFlow.
https://www.tensorflow.org/beta/tutorials/tensorflow_text/intro
Apache License 2.0
1.23k stars 343 forks source link

Seg fault when using TF Text (built from source) as a custom keras layer #424

Closed xwli-chelsea closed 3 years ago

xwli-chelsea commented 4 years ago

Hi,

We've built tensorflow-text (2.2.0) from source using the scripts from ./oss_scripts. Our goal is to use the BertTokenizer as a keras layer. There were issues related to saving such models as discussed here: https://github.com/tensorflow/text/issues/224. We followed the one of the workaround provided in the thread (with code in the notebook: https://github.com/tensorflow/text/issues/224#issuecomment-644631076) and it works fine if we install tf text through pip install tensorflow-text==2.2.0.

However, when building from source, I'm getting segmentation fault with the exact same code at tokens = self.bert_tokenizer.tokenize(text).

I also tested using the tokenizer directly without a custom layer. It works fine and I can get output tokens.

I'm not sure what I did wrong here. Do I also need to build tensorflow from source instead of using the pip installed version? Could you please help with this issue? Any insight is appreciated!

xwli-chelsea commented 4 years ago

Adding more details:

I also tested this example here: https://github.com/tensorflow/text/blob/master/examples/keras_example_174.ipynb

Similar as before, I'm getting segmentation fault from the source build but no error when installing with the published wheel. I'm using a Redhat 7.6(Maipo) machine, with bazel==3.5.0.

broken commented 3 years ago

Are you building with the same system that you are running the package on? Building on one system and running on another could be the issue.

Normally segfaults can occur due to ABI errors, which could be caused by the above. I believe TF provides a docker image that you can try building on if you want a more universal binary. Though from issues I've seen others have, I don't know how updated they keep this image, and it may have changed since they built the 2.2 branch. It could be worth trying though.

Another thing to try is using the manylinux2020 toolchain when building. bazel build --config=manylinux2020 oss_scripts/pip_package:build_pip_package

ps. Apologies for the delayed reply; holidays and ongoing 2.4 release has diverted a lot of our attention.

xwli-chelsea commented 3 years ago

Thanks @broken for the detailed response. Yes after building text using the same env for our TF build, it went away. So it did came from differences in the docker we used. Will close the issue and happy holidays!