googleapis / python-aiplatform

A Python SDK for Vertex AI, a fully managed, end-to-end platform for data science and machine learning.
Apache License 2.0
629 stars 341 forks source link

Vertex AI RAG Shared Drive access #4032

Open davidADSP opened 3 months ago

davidADSP commented 3 months ago

When using the Vertex AI RAG service here:

https://github.com/googleapis/python-aiplatform/blob/b5c3cdd737acd695301c9a564d8f91371288f9f1/vertexai/preview/rag/rag_data.py

, it says that you can use this with Google Drive or Google Cloud Storage links

import vertexai

    vertexai.init(project="my-project")
    # Google Drive example
    paths = ["https://drive.google.com/file/123", "https://drive.google.com/file/456"]
    # Google Cloud Storage example
    paths = ["gs://my_bucket/my_files_dir", ...]

    response = vertexai.preview.rag.import_files(
        corpus_name="projects/my-project/locations/us-central1/ragCorpora/my-corpus-1",
        paths=paths,
        chunk_size=512,
        chunk_overlap=100,
    )

I have found that this doesn't work with Google Shared Drive links, even if the service account is added as per the documentation.

Is this intentional? If so, it would be useful to add this caveat to the documentation here:

https://cloud.google.com/vertex-ai/generative-ai/docs/llamaindex-on-vertexai#supported-doc-types

image
yinghsienwu commented 3 months ago

Could you share the error message? You may strip of any sensitive info. Thanks!