nyx-ai / stylegan2-flax-tpu

🖼 Training StyleGAN2 on TPUs in JAX
https://nyx-ai.github.io/stylegan2-flax-tpu
130 stars 11 forks source link

Unable to run code on colab TPU. #5

Open waleedrazakhan92 opened 2 years ago

waleedrazakhan92 commented 2 years ago

Hi, my end goal is to train the stylegan model using google colab TPU. Right now i'm running the code by following the instructions to first run inference on the provided checkpoints. Everything installs fine but when i run the command below python generate_images.py \ --checkpoint checkpoints/cookie-256.pkl \ --seeds 0 42 420 666 \ --truncation_psi 0.7 \ --out_path generated_images

I get the error "failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)" followed by the error "_pickle.UnpicklingError: pickle data was truncated" Please let me know what am i doing wrong.

mar-muel commented 2 years ago

Hi @waleedrazakhan92 - We've now added a Colab Notebook to our most recent release. Feel free to check it out! Please let us know if you find a way to speed up training on Colab TPU.

waleedrazakhan92 commented 1 year ago

Hi @waleedrazakhan92 - We've now added a Colab Notebook to our most recent release. Feel free to check it out! Please let us know if you find a way to speed up training on Colab TPU.

Thanks for the Notebook. The code runs fine uptil the actual training part where it mentions: WARNING:jax._src.lib.xla_bridge:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) WARNING:jax.experimental.compilation_cache.compilation_cache:Initialized persistent compilation cache at jax_cache 2022-12-13 19:26:14,733 [INFO ] [__main__ ]: Starting new run with config:.

So I take it as its still not able to utilize the TPU.

jimb2834 commented 1 year ago

@mar-muel - Hello I have the same issue as of 12/21

By the way, we are trying to use TPU

Thanks for sharing this it's very useful for education - Any idea why this is the case ?

image