tarrade / proj_multilingual_text_classification

Explore multilingal text classification using embedding, bert and deep learning architecture
Apache License 2.0
4 stars 1 forks source link

Code crashing with TPU on AI Platform while running on Colab TPU #60

Closed tarrade closed 4 years ago

tarrade commented 4 years ago

mismatch Tensorflow version CPU and TPU

dataset.map(pp.parse_tfrecord_glue_files, num_parallel_calls=tf.data.experimental.AUTOTUNE) File "/root/.local/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1628, in map preserve_cardinality=True) File "/root/.local/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4050, in init **self._flat_structure) File "/root/.local/lib/python3.7/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 4692, in parallel_map_dataset _ops.raise_from_not_ok_status(e, name) File "/root/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.NotFoundError: 'ParallelInterleaveDatasetV3' is neither a type of a primitive operation nor a name of a function registered in binary running on n-c3e638d3-w-0. One possible root cause is the client and server binaries are not built with the same version. Please make sure the operation or function is registered in the binary running in this process. [Op:ParallelMapDataset]

tarrade commented 4 years ago

issue fixed: need extra package+config. Musmatch betwen the VM and TPU cluster

ZyingHang commented 4 years ago

Hi, can you tell me what package and config did you use? I have the same problem.

tarrade commented 4 years ago

Hi @ZyingHang,

In my case I was using Tensorflow 2.2.0 (didn't try with TF 2.3.0). Since then, the GCPdocumentation was updated: https://cloud.google.com/tpu/docs/version-switching

you need this:

import tensorflow as tf
from cloud_tpu_client import Client

c = Client()
c.configure_tpu_version(tf.__version__, restart_type='ifNeeded')

This be update the TF version on the TPU cluster is you use a different version.

Hope this help

tarrade commented 4 years ago

if you have issue you can connect to me on twitter https://twitter.com/fabtar and I can add you in a thread with TPU/GCP experts that helped me solving this issue