Open GarrettLee opened 3 years ago
For those who came here with the same problem, you can install: https://pypi.org/project/cloud-tpu-profiler/ and the run this in the colab terminal:
> capture_tpu_profile --service_addr xx.xx.xx.xx:port --monitoring_level 2
the address is displayed when you initialize the tpu service
Can view the address as follows also:
import tensorflow as tf
tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
Also, you can emulate an terminal in colab following the article, here.
We recently added a CLI for checking this! Check out the tpu-info
utility on our TPU runtime.
Output like this:
TPU Chips
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━┓
┃ Device ┃ Type ┃ Cores ┃ PID ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━┩
│ /dev/accel0 │ TPU v2 chip │ 2 │ 1344 │
│ /dev/accel1 │ TPU v2 chip │ 2 │ 1344 │
│ /dev/accel2 │ TPU v2 chip │ 2 │ 1344 │
│ /dev/accel3 │ TPU v2 chip │ 2 │ 1344 │
└─────────────┴─────────────┴───────┴──────┘
Connected to libtpu at grpc://localhost:8431...
TPU Chip Utilization
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Core ID ┃ Memory usage ┃ Duty cycle ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ 0 │ 0.00 GiB / 7.48 GiB │ 0.00% │
│ 1 │ 0.00 GiB / 7.48 GiB │ 0.00% │
│ 2 │ 0.00 GiB / 7.48 GiB │ 0.00% │
│ 3 │ 0.00 GiB / 7.48 GiB │ 0.00% │
│ 4 │ 0.00 GiB / 7.48 GiB │ 0.00% │
│ 5 │ 0.00 GiB / 7.48 GiB │ 0.00% │
│ 6 │ 0.00 GiB / 7.48 GiB │ 0.00% │
│ 7 │ 0.00 GiB / 7.48 GiB │ 0.00% │
└─────────┴─────────────────────┴────────────┘
We recently added a CLI for checking this! Check out the
tpu-info
utility on our TPU runtime.Output like this:
TPU Chips ┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━┓ ┃ Device ┃ Type ┃ Cores ┃ PID ┃ ┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━┩ │ /dev/accel0 │ TPU v2 chip │ 2 │ 1344 │ │ /dev/accel1 │ TPU v2 chip │ 2 │ 1344 │ │ /dev/accel2 │ TPU v2 chip │ 2 │ 1344 │ │ /dev/accel3 │ TPU v2 chip │ 2 │ 1344 │ └─────────────┴─────────────┴───────┴──────┘ Connected to libtpu at grpc://localhost:8431... TPU Chip Utilization ┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Core ID ┃ Memory usage ┃ Duty cycle ┃ ┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ │ 0 │ 0.00 GiB / 7.48 GiB │ 0.00% │ │ 1 │ 0.00 GiB / 7.48 GiB │ 0.00% │ │ 2 │ 0.00 GiB / 7.48 GiB │ 0.00% │ │ 3 │ 0.00 GiB / 7.48 GiB │ 0.00% │ │ 4 │ 0.00 GiB / 7.48 GiB │ 0.00% │ │ 5 │ 0.00 GiB / 7.48 GiB │ 0.00% │ │ 6 │ 0.00 GiB / 7.48 GiB │ 0.00% │ │ 7 │ 0.00 GiB / 7.48 GiB │ 0.00% │ └─────────┴─────────────────────┴────────────┘
@sagelywizard hi, is there a way to install this on a cloud tpu vm?
Yep. pip install git+https://github.com/google/cloud-accelerator-diagnostics/#subdirectory=tpu_info
Yep.
pip install git+https://github.com/google/cloud-accelerator-diagnostics/#subdirectory=tpu_info
awesome, thanks
Are there something like
nvidia-smi
?I have tried https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/profiling_tpus_in_colab.ipynb?authuser=2#scrollTo=mNA__vniyY8e. But the tensorboard page shows
No dashboards are active for the current data set.
.