Open allenwang28 opened 1 year ago
I'm happy to take this one, I mostly intended to open this as a way to track this request!
@allenwang28 Thank you for the contribution. Let us know how it goes.
cc: @richardliaw
This fits into recent XLA work @scv119 has been reviewing.
Description
Using accelerators.md as a reference point, we can graduate TPUs from custom resources (i.e. marked as
--resources={"TPU": 1}
) to native resources.To teach Ray how to detect TPUs, within TPU VMs, we can actually check if TPU drivers exist by polling
/dev/accel*
, e.g.:or in Python:
We can also get the accelerator version like this:
but caveat: this approach relies on metadata attached to a TPU VM instance which will not work on GKE/KubeRay.
Some inconsistencies with GPUs that can be tricky:
Use case
Instead of
and
we can instead do
and