Open craffel opened 4 years ago
Are you on MacOS or Linux?
This laptop is MacOS.
To follow-up, setting num_parallel_calls
to anything 2 or above also causes the same hang, though it is intermittent (i.e. sometimes it iterates over the dataset without hanging, sometimes it doesn't):
import tensorflow as tf
import tensorflow_text
with tf.io.gfile.GFile("gs://t5-data/vocabs/cc_all.32000/sentencepiece.model", "rb") as f:
tokenizer = tensorflow_text.SentencepieceTokenizer(model=f.read())
ds = tf.data.Dataset.from_tensor_slices({"a": ["b"]*10})
ds = ds.map(lambda ex: tokenizer.tokenize(ex["a"]), num_parallel_calls=2)
for ex in ds.as_numpy_iterator():
print(ex)
So, it does not have anything to do with tf.data.experimental.AUTOTUNE
, it seems to be an interaction between a parallel map
and tokenize
.
I am having the same issue. Any update .
I believe this is fixed with https://github.com/tensorflow/text/commit/52f9004b4c34d0aea3c55dc3eb41a199946b8550
However, we haven't been able to get TF Text to build with this on Windows yet. (The virtual function table is missing for AttrValue in the TF lib). I've been actively working with the TF Infra team to get TF exporting this correctly, but it's been difficult.
Hi, I will explain what I am facing now.
In cloud TPU VM , the provides 👍 version of TF is 2.6.0 . If I do pip install tf-text it tries to overwrite that.
So I build it from scratch. But so many errors especially "local_config_tf" not found. The fix is I replaced @._config_tf//:libtensorflow_framework" -> @._tensorflow//tensorflow/core:framework" @._config_tf//:tf_header_lib" -> @._tensorflow//tensorflow/core:lib"
in 2 files
a. tftext.bzl b.third_party/sentencepiece/processor.patch
Build is successful all good. But when I am using Sentencepiece tokenizer and BertTokenizer inside tf.data.Dataset.map function , "Segmentation fault (core dumped)".
I checked "oss_configure/.run_tests.sh". 6 tests related to tokenizer failed. Remaining 274 tests were successful.
Any suggestion or help would be appreciated.
Thanks
On Mon, 12 Jul, 2021, 11:59 pm Robert Neale, @.***> wrote:
I believe this is fixed with 52f9004 https://github.com/tensorflow/text/commit/52f9004b4c34d0aea3c55dc3eb41a199946b8550
However, we haven't been able to get TF Text to build with this on Windows yet. (The virtual function table is missing for AttrValue in the TF lib). I've been actively working with the TF Infra team to get TF exporting this correctly, but it's been difficult.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/text/issues/374#issuecomment-878499438, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6KFU7VEOV5XJFNHQBJTTXMYCPANCNFSM4QHPKXQQ .
Having said that, outside map, it's working fine. But, I needed it inside map to do preprocessing on the fly.
On Tue, 13 Jul, 2021, 6:53 am sarath r nair, @.***> wrote:
Hi, I will explain what I am facing now.
In cloud TPU VM , the provides 👍 version of TF is 2.6.0 . If I do pip install tf-text it tries to overwrite that.
So I build it from scratch. But so many errors especially "local_config_tf" not found. The fix is I replaced @._config_tf//:libtensorflow_framework" -> @._tensorflow//tensorflow/core:framework" @._config_tf//:tf_header_lib" -> @._tensorflow//tensorflow/core:lib"
in 2 files
a. tftext.bzl b.third_party/sentencepiece/processor.patch
Build is successful all good. But when I am using Sentencepiece tokenizer and BertTokenizer inside tf.data.Dataset.map function , "Segmentation fault (core dumped)".
I checked "oss_configure/.run_tests.sh". 6 tests related to tokenizer failed. Remaining 274 tests were successful.
Any suggestion or help would be appreciated.
Thanks
On Mon, 12 Jul, 2021, 11:59 pm Robert Neale, @.***> wrote:
I believe this is fixed with 52f9004 https://github.com/tensorflow/text/commit/52f9004b4c34d0aea3c55dc3eb41a199946b8550
However, we haven't been able to get TF Text to build with this on Windows yet. (The virtual function table is missing for AttrValue in the TF lib). I've been actively working with the TF Infra team to get TF exporting this correctly, but it's been difficult.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/text/issues/374#issuecomment-878499438, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6KFU7VEOV5XJFNHQBJTTXMYCPANCNFSM4QHPKXQQ .
tf 2.6 isn't released yet; is this tf-nightly? Try installing tensorflow-text-nightly instead.
Yes I am having the same assumption 2.6.0 isn't released. :-) But, the version of tensorflow in TPU VM is 2.6.0 .
If I do install "tensorflow-text-nightly instead", it will replace the existing TF version (2.6.0) with 2.6.0rc0 . Then, TPU devices won't be recognized. So, I was restricted to use the provided TF version. And I build all .so for tf-text locally. Only SentencepieceTokenizer and BertTokenizer is not working inside tf.data.Dataset.map .
On Tue, Jul 13, 2021 at 9:16 AM Robert Neale @.***> wrote:
tf 2.6 isn't released yet; is this tf-nightly? Try installing tensorflow-text-nightly instead.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/text/issues/374#issuecomment-878754590, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6KADULF2FZMSFUH3PDDTXOZIPANCNFSM4QHPKXQQ .
The tensorflow-text-nightly package does not have safety restrictions on the version of TF you have installed, so it will not replace the existing TF version.
Thanks. I have tried tensorflow-text-nightly as you suggested. But got following error
NotFoundError Traceback (most recent call last)
<ipython-input-1-7af2306084ed> in <module>
----> 1 import tensorflow_text as tf_text
~/.local/lib/python3.8/site-packages/tensorflow_text/__init__.py in <module>
19 # pylint: disable=wildcard-import
20 from tensorflow_text.python import keras
---> 21 from tensorflow_text.python import metrics
22 from tensorflow_text.python.ops import *
23
~/.local/lib/python3.8/site-packages/tensorflow_text/python/metrics/__init__.py in <module>
18
19 # pylint: disable=wildcard-import
---> 20 from tensorflow_text.python.metrics.text_similarity_metric_ops import *
21
22 # Public symbols in the "tensorflow_text.metrics" package.
~/.local/lib/python3.8/site-packages/tensorflow_text/python/metrics/text_similarity_metric_ops.py in <module>
26 from tensorflow.python.framework import load_library
27 from tensorflow.python.platform import resource_loader
---> 28 gen_text_similarity_metric_ops = load_library.load_op_library(resource_loader.get_path_to_datafile('_text_similarity_metric_ops.so'))
29
30
/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/load_library.py in load_op_library(library_filename)
56 RuntimeError: when unable to load the library or get the python wrappers.
57 """
---> 58 lib_handle = py_tf.TF_LoadLibrary(library_filename)
59 try:
60 wrappers = _pywrap_python_op_gen.GetPythonWrappers(
NotFoundError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory
If this tokenizer works, then I can train my model on TPU without creating TFRecords. Thats a significant breakthrough to be frank. :)
@s4sarath Ugh! Apologies; there's a lot going on and I just realized you are trying to install on cloud tpu.. TF Text should be available on Cloud TPU by default. Can you test without trying to install a new version of tensorflow_text, and if it isn't available, create a new issue?
As an added benefit, the fix I referred to above should actually be available in the cloud instances already, as it's just our pip packages which are behind due to the Windows build issues.
No worries Robert. Maybe I wasn't clear enough.
By default tf-text is not available in TPU VM. I checked on v3-8 alphav2 machines on europe-west-4 region. I mailed TRC team regards support few days ago. Still haven't heard from them.
That's the reason I build it using bazel. Except tokenizers all other ops works so well inside tf.data.
On Wed, 14 Jul, 2021, 2:02 am Robert Neale, @.***> wrote:
@s4sarath https://github.com/s4sarath Ugh! Apologies; there's a lot going on and I just realized you are trying to install on cloud tpu.. TF Text should be available on Cloud TPU by default. Can you test without trying to install a new version of tensorflow_text, and if it isn't available, create a new issue?
As an added benefit, the fix I referred to above should actually be available in the cloud instances already, as it's just our pip packages which are behind due to the Windows build issues.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/text/issues/374#issuecomment-879382472, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6KB2CM3LL4BU4GTQSYTTXSPFDANCNFSM4QHPKXQQ .
A sample code to reproduce.
import tensorflow as tf
import tensorflow_text as tf_text
model_file_path = 'sample/spiece.model'
dtype = tf.int32
nbest_size = 0
alpha = 1.0
def _create_tokenizer(model_serialized_proto, dtype, nbest_size, alpha):
return tf_text.SentencepieceTokenizer(
model=model_serialized_proto,
out_type=dtype,
nbest_size=nbest_size,
alpha=alpha)
model_serialized_proto = tf.io.gfile.GFile(model_file_path,
"rb").read()
tokenizer_sp = _create_tokenizer(model_serialized_proto,
dtype,
nbest_size,
alpha)
def map_tokenize(text):
return tokenizer_sp.tokenize(text)
# Read wikipedia data
dataset = tf.data.Dataset.from_tensor_slices(['This is text 1', 'This is text2', 'This is text3', 'This is text4'])
ds = dataset.map(map_tokenize)
Error is as follows in TPU VM. The moment map + tokenizer executes
https://symbolize.stripped_domain/r/?trace=7f74312c64bf,7f7655bac20f,7f74312c6601,7f74312c7326,7f7352004f3b,7f74312bdf5c,7f74310c7368,7f74310c8aa0,7f74310c97e7,7f7425f63524,7f7412c6a156,7f7412c7147b,5f2fb8,902aff&map=b7c22d7954df6b6961e4435041132cf899ee4a5e:7f7421f01000-7f7435c00270
*** SIGSEGV (@0x14), see gl__________25#s15 received by PID 72222 (TID 72222) on cpu 56; stack trace: ***
PC: @ 0x7f74312c64bf (unknown) tensorflow::AttrSlice::Find()
@ 0x7f74213f71e0 976 (unknown)
@ 0x7f7655bac210 (unknown) (unknown)
@ 0x7f74312c6602 80 tensorflow::AttrSlice::Find()
@ 0x7f74312c7327 64 tensorflow::GetNodeAttr()
@ 0x7f7352004f3c 112 std::_Function_handler<>::_M_invoke()
@ 0x7f74312bdf5d 64 tensorflow::shape_inference::InferenceContext::Run()
@ 0x7f74310c7369 544 tensorflow::ShapeRefiner::RunShapeFn()
@ 0x7f74310c8aa1 352 tensorflow::ShapeRefiner::AddNodeInternal()
@ 0x7f74310c97e8 32 tensorflow::ShapeRefiner::AddNode()
@ 0x7f7425f63525 160 TF_FinishOperation
@ 0x7f7412c6a157 144 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()
@ 0x7f7412c7147c 720 pybind11::cpp_function::dispatcher()
@ 0x5f2fb9 (unknown) PyCFunction_Call
@ 0x902b00 (unknown) (unknown)
https://symbolize.stripped_domain/r/?trace=7f74312c64bf,7f74213f71df,7f7655bac20f,7f74312c6601,7f74312c7326,7f7352004f3b,7f74312bdf5c,7f74310c7368,7f74310c8aa0,7f74310c97e7,7f7425f63524,7f7412c6a156,7f7412c7147b,5f2fb8,902aff&map=b7c22d7954df6b6961e4435041132cf899ee4a5e:7f7421f01000-7f7435c00270,ca1b7ab241ee28147b3d590cadb5dc1b:7f74146f8000-7f742172ab20
E0714 03:05:42.738522 72222 coredump_hook.cc:292] RAW: Remote crash data gathering hook invoked.
E0714 03:05:42.738555 72222 coredump_hook.cc:384] RAW: Skipping coredump since rlimit was 0 at process start.
E0714 03:05:42.738568 72222 client.cc:222] RAW: Coroner client retries enabled (b/136286901), will retry for up to 30 sec.
E0714 03:05:42.738574 72222 coredump_hook.cc:447] RAW: Sending fingerprint to remote end.
E0714 03:05:42.738586 72222 coredump_socket.cc:124] RAW: Stat failed errno=2 on socket /var/google/services/logmanagerd/remote_coredump.socket
E0714 03:05:42.738598 72222 coredump_hook.cc:451] RAW: Cannot send fingerprint to Coroner: [NOT_FOUND] Missing crash reporting socket. Is the listener running?
E0714 03:05:42.738607 72222 coredump_hook.cc:525] RAW: Discarding core.
E0714 03:05:42.954572 72222 process_state.cc:771] RAW: Raising signal 11 with default behavior
Segmentation fault (core dumped)
So one more update.
I tried a hack to wrap it inside tf.py_function
, like this
def map_tokenize(text):
#text = text.numpy().decode().strip()
return tokenizer_sp.tokenize(text).merge_dims(-1, 1).to_tensor()
def map_tokenize_py(text):
input_ids = tf.py_function(map_tokenize, [text],
tf.int32)
return [input_ids]
# Read wikipedia data
dataset = tf.data.Dataset.from_tensor_slices(['This is text 1', 'This is text2 waw wdxce', 'This is text3', 'This is text4'])
dataset = dataset.batch(2)
ds = dataset.map(map_tokenize_py)
Now it works ( it is slow due to py_func ). But the moment we wrap the dataset in tf.strategy.experimental_distribute
, I get following error. Somehow, SentencePiece
is not placing on multiple TPU devices, I guesss ( I am no expert ) .
2021-07-14 04:00:46.163813: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at sentencepiece_kernels.cc:275 : Not found: Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist.
2021-07-14 04:00:46.164110: W tensorflow/core/framework/op_kernel.cc:1680] Unknown: NotFoundError: Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist. [Op:SentencepieceTokenizeOp]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 247, in __call__
return func(device, token, args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 135, in __call__
ret = self._func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 645, in wrapper
return func(*args, **kwargs)
File "<ipython-input-24-9efe8b362f62>", line 3, in map_tokenize
return tokenizer_sp.tokenize(text).merge_dims(-1, 1).to_tensor()
File "/home/sidhu/Libraries/text/tensorflow_text/python/ops/sentencepiece_tokenizer.py", line 151, in tokenize
gen_sentencepiece_tokenizer.sentencepiece_tokenize_op(
File "<string>", line 175, in sentencepiece_tokenize_op
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 6901, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.NotFoundError: Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist. [Op:SentencepieceTokenizeOp]
You're right. Cloud TPU has tf.Text by default, but it is not there with the custom TPU VMs. I talked with cloud tpu and the custom op support should be better with 2.6. Currently it's a monolithic build which is probably why tf.Text couldn't find the libtensorflow_framework.so file.
Regarding your latest error, I'll try to get somebody on the team to look at it if I can't find more time myself.
Thanks Robert. Much appreciated. 👍
On Fri, 16 Jul, 2021, 2:25 am Robert Neale, @.***> wrote:
You're right. Cloud TPU has tf.Text by default, but it is not there with the custom TPU VMs. I talked with cloud tpu and the custom op support should be better with 2.6. Currently it's a monolithic build which is probably why tf.Text couldn't find the libtensorflow_framework.so file.
Regarding your latest error, I'll try to get somebody on the team to look at it if I can't find more time myself.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/text/issues/374#issuecomment-881007307, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6KDUAV6CPD3VYWCEMGTTX5DNFANCNFSM4QHPKXQQ .
On my local machine, the following code snippet hangs after printing out a few examples (and cannot be killed via a keyboard interrupt; it must be sigkilled):
This does not freeze:
nor does this:
It appears that there is some problematic interaction between setting
num_parallel_calls=tf.data.experimental.AUTOTUNE
in amap
andtensorflow_text.SentencepieceTokenizer.tokenize
. Note that this is only on my local machine; it does not appear to be true e.g. in a public colab kernel. The Python environment I am using was created via pyenv; using Python 3.8.5 and tensorflow/tensorflow-text==2.3.0. Here is the output ofpip freeze
.