Google Vertex AI Vector Store indexEndpoint in other project

TJMaria commented 10 months ago

I'm running into an issues using:

Text Embedding/Google Vertex AI
Vector Store/Google Vertex AI Matching Engine

I am deploying a NodeJs app in Google cloud, which should have the right service account and premissions. Let's say my NodeJs app is deployed in project A.

My indexEndpoint is deployed in a different project, project B.

I can't seem to find a way to force the MatchingEngine to look for the indexEndpoint in project B. Only place i can pass the projectId is through the authOptions, but that doesn't seem to work.

dosubot[bot] commented 10 months ago

🤖

Hello,

To configure the Google Vertex AI Matching Engine in your NodeJs app deployed in project A to locate the indexEndpoint in a different project, project B, you need to ensure that the service account used for authentication in project A has the necessary permissions to access the resources in project B.

This can be achieved by adding the service account of project A to project B with the appropriate roles. Once this is done, you can specify the index and indexEndpoint of the Matching Engine in project B when initializing the MatchingEngine object in your NodeJs app.

Here's how you can do it:

Add the service account of project A to project B with the necessary roles. This can be done in the Google Cloud Console.
In your NodeJs app, when initializing the MatchingEngine object, specify the index and indexEndpoint of the Matching Engine in project B.

const config: MatchingEngineArgs = {
  index: process.env.GOOGLE_VERTEXAI_MATCHINGENGINE_INDEX!, // Index in Project B
  indexEndpoint: process.env.GOOGLE_VERTEXAI_MATCHINGENGINE_INDEXENDPOINT!, // Index Endpoint in Project B
  apiVersion: "v1",
  docstore: store,
  authOptions: {
    keyFilename: '/path/to/keyfile.json', // Path to the service account key file
    scopes: ['https://www.googleapis.com/auth/cloud-platform'] // Scopes for the service account
  }
};

const engine = new MatchingEngine(embeddings, config);

In the above code, GOOGLE_VERTEXAI_MATCHINGENGINE_INDEX and GOOGLE_VERTEXAI_MATCHINGENGINE_INDEXENDPOINT should be set to the index and indexEndpoint of the Matching Engine in project B. The authOptions should be set with the path to the service account key file and the necessary scopes.

Please note that the service account used for authentication should have the necessary permissions to access the resources in both project A and project B.

You can find more information about this in the Google Vertex AI integration documentation and the Google Vertex AI Matching Engine code in the langchainjs repository.

I hope this helps! If you have any other questions, feel free to ask.

Best, Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] commented 10 months ago

🤖

Hello,

Based on your code snippet and the information you provided, here are the answers to your questions:

Yes, you do need to pass the service account key file in the authOptions when initializing the MatchingEngine object. This is because the MatchingEngineArgs interface extends GoogleVertexAIConnectionParams<GoogleAuthOptions>, and the GoogleAuthOptions is used in the GoogleAuth constructor in each of the connection classes (IndexEndpointConnection, RemoveDatapointConnection, UpsertDatapointConnection, FindNeighborsConnection). This is how the Google Cloud SDK authenticates your requests.
The index and indexEndpoint are only the IDs, not the entire path including the projectID. This is evident from the buildUrl methods in the connection classes where the projectId is fetched separately and then combined with the index or indexEndpoint to form the full URL.
The service account will not by default use the correct project. The projectId is fetched using the getProjectId method of the GoogleAuth client, which will return the project ID that the service account is associated with. If you want to use a different project, you would need to use a service account that is associated with that project, or manually specify the project ID when forming the URL.

Please note that the above information is based on the current implementation of the MatchingEngine and related classes in the LangChain Javascript framework. If you encounter any issues or have further questions, feel free to ask.

Best regards.

Sources

langchain/src/vectorstores/googlevertexai.ts

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

TJMaria commented 10 months ago

@dosu-bot Ok, so I HAVE TO pass the service account key file? Linking service account in the GCP project is not enough?

And the index and indexEndpoint are only the ID's correct? Not the entire path including projectID? The service account should be default use the correct project?

dosubot[bot] commented 10 months ago

🤖

Hello,

To configure the Google Vertex AI Matching Engine to locate the indexEndpoint in a different project, Project B, you need to ensure that the service account linked to your Node.js application has the necessary permissions to access resources in Project B.

You can pass the service account key file in the authOptions when creating an instance of the MatchingEngine class. The authOptions should include the path to the service account key file.

The index and indexEndpoint should be specified in the format projects/{project}/locations/{location}/indexes/{index} and projects/{project}/locations/{location}/indexEndpoints/{indexEndpoint} respectively.

However, the MatchingEngine class in the langchainjs codebase does not accept a projectId in the authOptions to override the default project used by the service account. The projectId is retrieved from the credentials object within the getProjectId method of the WebGoogleAuth class. There is no provision in the provided code to override the projectId using authOptions.

Here is an example of how you can do this:

import { MatchingEngine } from 'langchain/src/vectorstores/googlevertexai.ts';
import { Embeddings } from 'langchain/src/embeddings/base.js';
import { Docstore } from 'langchain/src/schema/index.js';

// Initialize your embeddings and docstore
const embeddings = new Embeddings(/* parameters */);
const docstore = new Docstore(/* parameters */);

// Specify the authOptions with the service account key file
const authOptions = {
  keyFilename: '/path/to/service-account-key.json', // Path to the service account key file
};

// Specify the index and indexEndpoint
const index = 'projects/project-b/locations/location/indexes/index';
const indexEndpoint = 'projects/project-b/locations/location/indexEndpoints/indexEndpoint';

// Initialize the MatchingEngine
const matchingEngine = new MatchingEngine(embeddings, {
  docstore,
  authOptions,
  index,
  indexEndpoint
});

In this example, replace /path/to/service-account-key.json with the path to your service account key file, project-b with the ID of Project B, location with the location of your resources, index with the ID of your index, and indexEndpoint with the ID of your index endpoint.

In the context shared, the GoogleVertexAI class in the langchain/src/llms/googlevertexai/index.ts file also uses the GoogleAuth class from the google-auth-library package to authenticate requests to the Vertex AI API. This class accepts an options object that can include the keyFilename and projectId properties, similar to the MatchingEngine class. This means that you can use the same authOptions object to authenticate requests to the Vertex AI API when using the GoogleVertexAI class.

Please let me know if you have any other questions.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

kevsjh commented 10 months ago

AFAIK, Vertex AI store in langchainjs is a pure Node.js implementation, which means that it depends on the google-auth-library package. By default, the google-auth-library package uses the service account for the resource that the Node.js application is deployed on.

There are two ways to override the default service account:

Set the GOOGLE_APPLICATION_CREDENTIALS environment variable as described in langhchainjs documentation. This variable should point to the path of the JSON file that contains the credentials for the service account that you want to use.
Grant cross-account service access to the service account from project A. This will allow the service account to access resources in project B. To do this, you need to grant the vertex ai user role to the service account from project A in project B.

TJMaria commented 10 months ago

@kevsjh Thank for your input, I also spotted your issue.

When working locally I was using the gcloud cli to authenticate (using my own acount): gcloud auth application-default login

But now that I'm struggling to get it working in the deployed App, i decided to try it locally with a keyfile from the service account of project B (where the indexEndpoint is located). But I also can't seem to get that to work.

I'm using a json keyfile path in GOOGLE_APPLICATION_CREDENTIALS.

I get a 401 error, even if the service account is has the role of owner:

        code: 401,
        message: 'Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.',
        errors: [
          {
            message: 'Invalid Credentials',
            domain: 'global',
            reason: 'authError',
            location: 'Authorization',
            locationType: 'header'
          }
        ],
        status: 'UNAUTHENTICATED'
      }

Only when passing a keyfile from my personal account, generated by gcloud auth application-default login does GOOGLE_APPLICATION_CREDENTIALS work for me.

TJMaria commented 10 months ago

@kevsjh
When i try using the cross-account approach, the request is done in the wrong project. So then i get a 403 because the resource doesn't exist

TJMaria commented 10 months ago

As far is I can see, the authOptions are not being passed to the indexEndpointClient from the MatchingEngine constructor: https://github.com/langchain-ai/langchainjs/blob/4f56cf7ceb7f1abfc3f903c9d8a9bdb0c3d5b176/langchain/src/vectorstores/googlevertexai.ts#L389

That creates a client with new (empty) authOptions: https://github.com/langchain-ai/langchainjs/blob/4f56cf7ceb7f1abfc3f903c9d8a9bdb0c3d5b176/langchain/src/vectorstores/googlevertexai.ts#L65

kevsjh commented 8 months ago

It seems that in the updated version, google auth is passed in to the class correctly but both google auth docs and langchainjs doc on vertex ai authentication seems a little lacking.

The below should be the correct parameters to pass in.

      const engine = new MatchingEngine(embeddings, {
      index: process.env.GOOGLE_VERTEXAI_MATCHINGENGINE_INDEX!,
      indexEndpoint: process.env.GOOGLE_VERTEXAI_MATCHINGENGINE_INDEXENDPOINT!,
      apiVersion: "v1beta1",
      docstore: store,
      location: 'us-central1',
      authOptions: {
        scopes: ['https://www.googleapis.com/auth/cloud-platform'],
        projectId: process.env.GOOGLE_APPLICATION_PROJECT_ID,
        credentials: {
          type: 'service_account',
          "client_email": process.env.GOOGLE_APPLICATION_CLIENT_EMAIL,
          "private_key": process.env.GOOGLE_APPLICATION_PRIVATE_KEY
        }
      },
    });

I have tested the above in nextjs-14 api app router on my local machine with both gcloud auth logout, and also gcloud auth logged in but set to different project-id to validate this to ensure the service account impersonation is invoked, so this should work in production as well by setting the correct env.

Another important note was that for private_key environment var, it should be encapsulated in double quote "" instead of single quote ''. You will get an error Error: error:1E08010C:DECODER routines::unsupported if the env is single quoted.

So the correct env is

GOOGLE_APPLICATION_PRIVATE_KEY="-----BEGIN PRIVATE KEY .... -----END PRIVATE KEY-----\n"

TJMaria commented 7 months ago

I ended up stepping away from LangchainJs and implementing it myself. @kevsjh Thanks for taking a look but it can't confirm if this would solve the issues i had.

dosubot[bot] commented 4 months ago

Hi, @TJMaria,

I'm helping the langchainjs team manage their backlog and am marking this issue as stale. It seems like you were experiencing difficulty configuring the Google Vertex AI Matching Engine to look for the indexEndpoint in a different project than the NodeJs app. Despite several responses providing guidance on how to configure the Matching Engine, it appears that you ultimately decided to implement the solution independently.

Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

langchain-ai / langchainjs