Audio Timestamp option missing in Generation Config

googleapis / python-aiplatform

A Python SDK for Vertex AI, a fully managed, end-to-end platform for data science and machine learning.

Apache License 2.0

628 stars 341 forks source link

Audio Timestamp option missing in Generation Config #4511

Open Waheguru-Anurag opened 5 days ago

Waheguru-Anurag commented 5 days ago

Hi I was reading the vertex ai documentation - [https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/audio-understanding#:~:text=files%2C%20enable%20the-,audioTimestamp,-parameter%20in%20GenerationConfig]

Here it is mentioned:

Audio-only timestamps: To accurately generate timestamps for audio-only files, you must configure the audio_timestamp parameter in generation_config.

But I am not able to set this parameter in generation_config.

jaycee-li commented 3 days ago

Hi @Waheguru-Anurag, this field was recently added (last week) and is currently only available in the REST API. The Python SDK hasn't been updated to support it yet.

tfriedel commented 2 days ago

I tried this parameter using the REST API but didn't notice an improvement. For a 3 min long file timestamps suggested it was over 4 minutes long.

I used this prompt:

Translate the audio to english. Include timestamps and speakers. Use the following format:

<example>
[00:17] Agent (male): Yes, sir. So, you have a shop that sells medicines, fertilizers, and seeds?
[00:19] Customer (male): Hmm.
[00:21] Agent (male): Sir, I have this app, sir, for retailers.
</example>