ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.21k stars 5.61k forks source link

[Autoscaler] google-cloud-storage seems cannot read GOOGLE_APPLICATION_CREDENTIALS #25308

Open orcahmlee opened 2 years ago

orcahmlee commented 2 years ago

What happened + What you expected to happen

I am using Ray to load files from Google Cloud Storage. According to the google-cloud-storage Doc, if not pass project, it will fall back to read the credential file from environment variable GOOGLE_APPLICATION_CREDENTIALS.

When I use the normal Python function, it went well.

def normal_func():
    storage_client = storage.Client()
    return {
        "env_foo": os.getenv("foo"),
        "env_google": os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
        "project_name": storage_client.project,
    }

However, when I wrap this function to Ray remote function, it shows OSError: Project was not passed and could not be determined from the environment.

@ray.remote
def ray_func():
    storage_client = storage.Client()
    return {
        "env_foo": os.getenv("foo"),
        "env_google": os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
        "project_name": storage_client.project,
    }

The full log is

Traceback (most recent call last):
  File "zztmp.py", line 36, in <module>
    print(ray.get(ray_func.remote()))
  File "/Users/andrew/miniconda3/envs/plumber/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/Users/andrew/miniconda3/envs/plumber/lib/python3.8/site-packages/ray/worker.py", line 1763, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(OSError): ray::ray_func() (pid=37377, ip=127.0.0.1)
  File "zztmp.py", line 17, in ray_func
    storage_client = storage.Client()
  File "/Users/andrew/miniconda3/envs/plumber/lib/python3.8/site-packages/google/cloud/storage/client.py", line 161, in __init__
    super(Client, self).__init__(
  File "/Users/andrew/miniconda3/envs/plumber/lib/python3.8/site-packages/google/cloud/client/__init__.py", line 318, in __init__
    _ClientProjectMixin.__init__(self, project=project, credentials=credentials)
  File "/Users/andrew/miniconda3/envs/plumber/lib/python3.8/site-packages/google/cloud/client/__init__.py", line 269, in __init__
    raise EnvironmentError(
OSError: Project was not passed and could not be determined from the environment.

The workaround I used now is I passed any non-relevant string to the project argument and the OSError won't show.

@ray.remote
def ray_func():
    storage_client = storage.Client("")  # <---pass any string would works, e.g.: "", "tmp", "wrong-project-name"
    return {
        "env_foo": os.getenv("foo"),
        "env_google": os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
        "project_name": storage_client.project,
    }

Versions / Dependencies

# OS: macOS 12.2.1

Python 3.8
google-cloud-storage     2.2.1
ray                      1.11.0
# OS: 
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Python 3.8
google-cloud-storage          2.3.0
ray                           1.12.0

Reproduction script

import os
from google.cloud import storage
import ray

def normal_func():
    storage_client = storage.Client()
    return {
        "env_foo": os.getenv("foo"),
        "env_google": os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
        "project_name": storage_client.project,
    }

@ray.remote
def ray_func():
    storage_client = storage.Client()  # This will raise OSError
    # storage_client = storage.Client("")  # Just pass any string, it will succeed
    return {
        "env_foo": os.getenv("foo"),
        "env_google": os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
        "project_name": storage_client.project,
    }

if __name__ == "__main__":
    print(normal_func())

    runtime_env = {
        "working_dir": ".",
        "env_vars": {
            "foo": "bar",
            "GOOGLE_APPLICATION_CREDENTIALS": "credential.json",
        },
    }
    ray.init(runtime_env=runtime_env)
    print(ray.get(ray_func.remote()))

Issue Severity

No response

DmitriGekhtman commented 2 years ago

@cadedaniel would you mind taking a look at this and triaging this issue?

cadedaniel commented 2 years ago

Hi @orcahmlee. I want to thank you for the excellent bug report and reproduction script!

I've looked into this and believe I have the root cause: storage.Client() succeeds in version 2.2.1 of Google's google-cloud-storage package, but fails in version 2.3.0. This explains why you saw this failure in a remote Ray function but not locally -- your Mac has the working version, but the Ray node has the up-to-date but failing version.

I downgraded the dependencies on my Ray node to google-cloud-storage 2.2.1 (same as my Mac). Once I did this, your reproduction script no longer crashes.

Does this solution work for you?

orcahmlee commented 2 years ago

Hi @orcahmlee. I want to thank you for the excellent bug report and reproduction script!

I've looked into this and believe I have the root cause: storage.Client() succeeds in version 2.2.1 of Google's google-cloud-storage package, but fails in version 2.3.0. This explains why you saw this failure in a remote Ray function but not locally -- your Mac has the working version, but the Ray node has the up-to-date but failing version.

I downgraded the dependencies on my Ray node to google-cloud-storage 2.2.1 (same as my Mac). Once I did this, your reproduction script no longer crashes.

Does this solution work for you?

Hi @cadedaniel , Sorry for my reproduction script thas was not clear and I already updated it. In the new reproduction script, within the ray_func:

This happened both in google-cloud-storage package version 2.2.1 and version 2.3.0.

Would you mind to run the new reproduction script on your Mac with google-cloud-storage 2.2.1?