Open jonyvp opened 4 years ago
It works though when I define my model (and thus TPU strategy) after loading the TFRecordDataset. i.e. first loading the dataset, then defining:
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.experimental.TPUStrategy(resolver)
with strategy.scope():
I think we need a few more pieces of info here - is this a private bucket? are you using tensorflow-gcs-config? can you provide a self-contained repro notebook?
I ran into this on colab for TF 2.8.0. I am also trying to instantiate a tf dataset from a tfrecord stored on GCS.
Any idea what the root cause may be? This used to work for me 2-3 mths ago. I also tried what you did by defining AFTER loading dataset, and it seems to work.
I think this is a TPU related bug? I will next try with GPU and see if the same err happens.
Update: Although I can sanity test by iterating manually on the dataset, i got further error if i tried model.fit(...)
This is a security problem that GCS prevents anonymous from accessing the bucket. So, you must assign the right to the TPU to use the data. Check permission tap in your GCS bucket, add a new one, name the TPU service-495559152420@cloud-tpu.iam.gserviceaccount.com
, and give it storage manager.
This yields a Permission error:
Describe the expected behavior: It is expected this works. Tested on a GPU Runtime: Works An alternative would be to load the data to the colab local disk. But TPU's don't allow for this data to be loaded
The web browser you are using (Chrome, Firefox, Safari, etc.): Chrome