Closed jasonbrancazio closed 2 years ago
@jasonbrancazio
Please pass your project into init
explicitly:
aiplatform.init(project='my-project')
An alternative is to retrieve it from the environment if it's the same project the CustomJob is launched from:
aiplatform.init(project=os.environ.get("CLOUD_ML_PROJECT_ID"))
init
as well:aiplatform.init(credentials=creds)
You can pass the entire configuration as one call:
aiplatform.init(project='my-project', credentials=creds, experiment='test')
@sasha-gitg can you provide more details about how to instantiate the credentials? I'm trying to avoid copying a service account .json file to the Docker image.
It's interesting that I can access BigQuery and Cloud Storage inside a CustomJob without using service account but I can't initialize the aiplatform module.
I think I found a relevant comment in the Vertex AI documentation: https://cloud.google.com/vertex-ai/docs/general/access-control#grant_service_agents_access_to_other_resources
"Note: If you want your custom training code to obtain an OAuth 2.0 access token with the https://www.googleapis.com/auth/cloud-platform scope, then you must use a custom service account for training. You cannot give this level of access to the Vertex AI Custom Code Service Agent."
Looks like I'm stuck with a custom service account. This is a limitation that the Vertex AI team should consider addressing. It seems unusual that someone should have to use a custom service account just to run aiplatform.init() in a CustomJob.
For the sake of completeness, I'm confirming that I resolved my issue using something like the code snippet below inside a CustomJob using a custom service account. Thanks @sasha-gitg
# in a task.py called at the start of a CustomJob...
CREDENTIAL_PATH = '/training_app/creds.json'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = CREDENTIAL_PATH
credentials, project = google.auth.default()
aiplatform.init(project='my-project', experiment='my-experiment')
aiplatform.start_run(run='fakerun2')
# log your hyperparams to the experiment at the start of the run
aiplatform.log_params({'test_param': .01})
# train a model in your CustomJob, then log metrics
aiplatform.log_metrics({'test_metric': 1})
As you can see, I first tested by copying a service account .json file and setting GOOGLE_APPLICATION_CREDENTIALS. But there is an easier and more secure way that is not well documented.
You can simply specify the e-mail address of a service account when running a CustomJob with the python client rather than having to pass credentials to aiplatform.init() as @sasha-gitg suggested or copying the service account .json file into the Docker image. (Note that you can't use the UI to run the CustomJob if you go this route.)
You can give your custom service account the same "Vertex AI Custom Code Service Agent" IAM role that is used by the service agent.
Running the custom job then looks something like this:
SERVICE_ACCOUNT_EMAIL_ADDRESS='some_service_account@your_project.iam.gserviceaccount.com'
custom_job = aiplatform.CustomJob(
display_name=experiment_run_name,
worker_pool_specs=worker_pool_specs,
staging_bucket=staging_bucket
)
custom_job.run(
service_account=SERVICE_ACCOUNT_EMAIL_ADDRESS,
enable_web_access=True
)
It was fun to use enable_web_access to figure this out: you can temporarily modify your application code to have your container enter an infinite loop, then navigate to the CustomJob in the console, click the link to open up a terminal for debugging, and use ipython to see what privileges the running container has.
These docs were helpful, in particular the statement "If you are creating a CustomJob, specify the service account's email address in CustomJob.jobSpec.serviceAccount": https://cloud.google.com/vertex-ai/docs/general/custom-service-account#attach
Note that the python client does not have a way to specify CustomJob.jobSpec.serviceAccount directly. I had to check out the source code for aiplatform.CustomJob and its run() method to see the way to specify the service account e-mail.
Closing the issue since it seems fixed. Feel free to reopen if there is other related issue.
I want to use metadata store experiment tracking with CustomJobs so I can log parameters and metrics.
When I run a CustomJob with a custom container in Vertex AI, I get a ACCESS_TOKEN_SCOPE_INSUFFICIENT error when I try to initialize the aiplatform SDK with aiplatform.init().
I've tried to remedy this error by passing scoped credentials to aiplatform.init(), but as you can see from the stacktrace below, it does not work.
I can successfully run aiplatform.init() and create an experiment on my laptop using ipython when not passing any credentials or passing credentials received from google.auth.default(). In this case I'm using application default credentials for my user, which is the owner of my project.
I can also run aiplatform.init() in ipython on my laptop with a service account that has only the Vertex AI Custom Code Service Agent role. This was an experiment to attempt to mirror the role granted to the AI Platform Custom Code Service Agent when Vertex AI runs a CustomJob.
If I temporarily upgrade the AI Platform Custom Code Service Agent to an owner role, and run the custom container, I still get the error. The issue thus seems to relate to Oauth scoping and not role assignment.
To reproduce, I've provided a minimal example. Build this dockerfile, push it to Container Registry, and create a Custom Job using the web UI for Vertex Training. The failure occurs when aiplatform.init() is called. From the stacktrace we can see the error arises specifically when get_or_create from metadata_store.py is called.
Here is the stacktrace I received: