Open christy opened 2 years ago
@stephanie-wang @sven1977 Could you help triage?
Also quick question for @christy : is this a regression from previous versions?
On the GPUtil issue (the first one): I can reproduce the error with a fresh colab notebook that doesn't have a GPU. I noticed we are using both GPUtil and gpustat -- the latter seems better maintained, can we only use that? If we use both of them, that puts us at a severe disadvantage since then we inherit the bugs from both packages. cc @richardliaw
Something might have gotten fixed between when I first tried Ray 2.0 on Colab. It did not work the first time I tried my code on Colab.
Right now, ray.init() works on Colab default as long as you do not import GPUtil.
We should remove all WARNING messages that tell user to install gputil?
[image: image.png]
On Mon, Sep 12, 2022 at 7:36 PM Philipp Moritz @.***> wrote:
On the GPUtil issue (the first one): I can reproduce this with a fresh colab notebook that doesn't have a GPU. I noticed we are using both GPUtil and gpustat -- the latter seems better maintained, can we only use that? If we use both of them, that puts us at a severe disadvantage since then we inherit the bugs from both packages. cc @richardliaw https://github.com/richardliaw
— Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/28457#issuecomment-1244821010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXFRQRSPF5OB3OSC5LH43V57SCLANCNFSM6AAAAAAQK3OGEY . You are receiving this because you were mentioned.Message ID: @.***>
@richardliaw - FYI - Philipp made a comment we should replace GPUtil with gpustat
What happened + What you expected to happen
Ray does not easily run on Google Colab
Steps to reproduce:
What you get: ValueError: invalid literal for int() with base 10: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."
Workaround: Do not import gputil if you did not set Colab runtime with GPU!
RLlib does not easily run on Colab
Steps to reproduce
What you get: Never-ending Warning messages about insufficient resources. WARNING insufficient_resources_manager.py:128 -- Ignore this message if the cluster is autoscaling. You asked for 1.0 cpu and 0 gpu per trial, but the cluster only has 2.0 cpu and 0 gpu.
Workaround:
Versions / Dependencies
Python 3.7.13 (default Colab) ray: 2.0.0 Number of CPUs in this system: 2. (default Colab runtime) Number of GPUs in this system: 0 (default Colab runtime)
Reproduction script
Issue Severity
No response