googlecolab / colabtools

Python libraries for Google Colaboratory
Apache License 2.0
2.17k stars 705 forks source link

Problem PyTorch/XLA TPU Profiling in Google Colab #4769

Closed alekseybaev closed 2 weeks ago

alekseybaev commented 1 month ago

I am engaged in machine learning in Google Collab. I use the environment with TPU. Previously, I used the following code for configuration: !pip install cloud-tpu-client==0.10 torch==2.0.0 torchvision==0.15.1 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl This code worked fine. But recently it stopped working. When I launch a cell with this code, a 403 error message appears. How can I fix this?

Using Chrome Internet search did not bring any results

sagelywizard commented 4 weeks ago

Hello! We migrated to a different TPU architecture (see https://github.com/googlecolab/colabtools/issues/4481 for details), so I suspect that's the issue you're encountering.

So, there's two problems:

  1. That wheel was specifically for the "TPU Node" architecture, which we no longer support. "TPU Node" architecture meant that your TPU was hosted on a separate machine than your notebook. We migrated to the newer "TPU VM" architecture, where the TPU is attached to the VM hosting your notebook.
  2. The wheel you're trying to install an old/deprecated wheel (not owned by our team. That's managed by the PyTorch/XLA folks). It was specifically for the TPU Node architecture, which we no longer support. I believe that they removed it, because we no longer support the old/deprecated architecture.

We preinstall a supported version of PyTorch/XLA. I'd recommend porting your notebook to use the preinstalled version of PyTorch/XLA on TPU.

alekseybaev commented 3 weeks ago

Hi! Thanks for the answer and recommendation. I don't understand what your recommendation to port my notebook?

sagelywizard commented 3 weeks ago

Hi alekseybaev!

I'm suggesting that you remove !pip install cloud-tpu-client==0.10 torch==2.0.0 torchvision==0.15.1 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl. Why? That version of PyTorch/XLA has been deprecated and removed. That's why you're seeing an error. If you remove that, your notebook should use the recent version of PyTorch that we preinstall on the system.

You'll need to update your notebook to use a more recent version of PyTorch/XLA. It may require some work to migrate your notebook from PyTorch 2.0.0 to a more recent version of PyTorch.

I can't give generic recommendations on how to do this migration, since it depends on the details of your notebook. If you encounter issues migrating your notebook, I'd recommend asking for help on StackOverflow or with the PyTorch/XLA folks.

Hope that's helpful!

alekseybaev commented 2 weeks ago

Thank you! I will try to implement your recommendations