kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.26k stars 890 forks source link

TPU Instance Creation #237

Open zzj0402 opened 2 years ago

zzj0402 commented 2 years ago

Please elaborate on the TPU creation process. Which TPU to use? Which version of the software to use? I am getting not enough space error.

umm-maybe commented 1 year ago

Is the error you are getting something like this: "jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Failed to allocate request for 32.00MiB (33554432B) on device ordinal 0: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well)." If so, I have the same problem. It doesn't seem to matter what size dataset I use...

zzj0402 commented 1 year ago

Is the error you are getting something like this: "jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Failed to allocate request for 32.00MiB (33554432B) on device ordinal 0: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well)." If so, I have the same problem. It doesn't seem to matter what size dataset I use...

No. I get a dependency installation error. The Google TPU VM cannot install the requirements. I haven't gone that far just yet.