Open radna0 opened 2 months ago
A CLI tool to show XLA Devices usage
TPU device: xla:0 memory: 0.0 / 16.62 GB TPU device: xla:1 memory: 0.0 / 16.62 GB TPU device: xla:2 memory: 0.0 / 16.62 GB TPU device: xla:3 memory: 0.0 / 16.62 GB TPU device: xla:4 memory: 0.0 / 16.62 GB TPU device: xla:5 memory: 0.0 / 16.62 GB TPU device: xla:6 memory: 0.0 / 16.62 GB TPU device: xla:7 memory: 0.0 / 16.62 GB Total TPU memory: 0.0 / 132.96 GB
No two worker can use one xla device at the same time, so there's no way to run one script for monitoring and one for utilizing the devices
There's no way right now to monitor for TPU/XLA devices usage
nvidia-smi nvitop rocm-smi htop jax-smi
@will-cromar we already have the tpu-info cli tools.
🚀 Feature
A CLI tool to show XLA Devices usage
Motivation
No two worker can use one xla device at the same time, so there's no way to run one script for monitoring and one for utilizing the devices
Pitch
There's no way right now to monitor for TPU/XLA devices usage
Alternatives
nvidia-smi nvitop rocm-smi htop jax-smi
Additional context