Closed JiahaoYao closed 2 years ago
@scv119
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray's public slack channel.
Thanks again for opening the issue!
TPU support is added to Ray.
Hey @jjyao Can you say how should use it?
i'm running notebook in tpu vm env, and also Ray itself shows CPU: 0.0/96.0 - TPU 0.0/4.0 in trials status table
but when i set this:
tune.with_resources(train_model, resources={"cpu": 96, "tpu": 4}),
it throws:
Error: No available node types can fulfill resource request {'CPU': 96.0, 'tpu': 4.0}. Add suitable node types to this cluster to resolve this issue.
Also please tell, does we need this block?
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
strategy = tf.distribute.TPUStrategy(tpu)
except ValueError:
strategy = tf.distribute.get_strategy()
if yes, where to place it?
and where to use strategy
? wrap tune.Tuner
with strategy? or wrap model.fit
in train_model(config)
with strategy?
Description
When we do
ray.init()
, it would be great if the tpu devices can be automatically supported.I found the current gpu instances are detected like this
https://sourcegraph.com/github.com/ray-project/ray/-/blob/python/ray/worker.py?L544-555
More specifically, the gpus are detected like this
https://github.com/ray-project/ray/blob/e142bb3874bb14d76da4fbd2d3808595fb6265d6/python/ray/_private/utils.py?q=CUDA_VISIBLE_DEVICE#L266-L307
I wish
Ray
could also support TPU, possible examples of detecting the tpu cores are:https://cloud.google.com/tpu/docs/cloud-tpu-tools
https://stackoverflow.com/questions/70783509/google-cloud-tpu-capture-tpu-profile-no-trace-event-is-collected-after-n-attem
Use case
https://github.com/ray-project/ray/issues/22251