Closed jsalfity closed 6 months ago
The issue here is that the Google AI Studio API is not as feature rich as their Vertex AI API, so we'll have to use the VertexAI SDK which means we have to do things in Google Cloud land. The basic rundown is to make a GC Project, enable the VertexAI API in that project, upload the relevant video file to GCS. Then open the project's Cloud Shell, install the Python SDK with pip install "google-cloud-aiplatform>=1.38"
and from there you can write a python script that sends videos to Gemini through Vertex AI. Here's a sample one
from vertexai.generative_models import GenerativeModel, Part
def prompt_video(project_id: str, location: str) -> str:
# Initialize Vertex AI
vertexai.init(project=project_id, location=location)
# Load the model
multimodal_model = GenerativeModel("gemini-1.0-pro-vision")
# Query the model
response = multimodal_model.generate_content(
[
Part.from_uri("gs://cloud-samples-data/video/animals.mp4", mime_type="video/mp4"), # Video, needs to be uploaded to GCS
"Explain what's happening in this video", # Prompt
]
)
print(response)
prompt_video("august-apricot-415506", "us-central1") # probably different project_id
Progress: Set up google cloud through personal account. Cloned code ran via cloud shell. Main addition:
Stored test data in GCS via a Cloud Storage --> Bucket --> 'task_decomposition_data' in the project=gen-lang-client-0368774908
project.
Error: Generic 500 internal service error when running the API.
Forgot to close. This all worked. Thank you, @bchen32
Gemini Pro Vision states it can support video and text input through the python API, . However, when calling with a video, the server responds with:
So... that means the python API can't accept videos?
How is this different than the HTTP request: https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/gemini#gemini-pro-vision
To reproduce, set the config to
llm_model : 'gemini-pro-vision'
withuse_video: True
. Then uncomment outhttps://github.com/jsalfity/task_decomposition/blob/1991fc4034ee06a20ac65110d79e724a876cf7e8/task_decomposition/utils/querying.py#L130-L144
and comment out https://github.com/jsalfity/task_decomposition/blob/1991fc4034ee06a20ac65110d79e724a876cf7e8/task_decomposition/utils/querying.py#L147-L154.
Then
python task_decomposition/analysis/query_LLM.py