code to use audio transcription using ai : import os import google.generativeai as genai

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

def upload_to_gemini(path, mime_type=None): """Uploads the given file to Gemini.

See https://ai.google.dev/gemini-api/docs/prompting_with_media """ file = genai.upload_file(path, mime_type=mime_type) print(f"Uploaded file '{file.display_name}' as: {file.uri}") return file

Create the model

generation_config = { "temperature": 1, "top_p": 0.95, "top_k": 64, "max_output_tokens": 8192, "response_mime_type": "text/plain", }

model = genai.GenerativeModel( model_name="gemini-1.5-flash", generation_config=generation_config, )

TODO Make these files available on the local file system

You may need to update the file paths

files = [ upload_to_gemini("generative-ai_audio_audio_covid_google_response.mp3", mime_type="audio/mpeg"), ]

chat_session = model.start_chat( history=[ { "role": "user", "parts": [ files[0], "Generate audio diarization, including transcriptions and speaker information for each transcription, for this interview. Organize the transcription by the time they happened.", ], }, { "role": "model", "parts": [ "## Audio Diarization \n\n0:11 - 0:35 Karen: Well, hello everyone. Helpfulness has always been at the heart of Google's mission, and so since the outbreak of the Covid-19 pandemic, we wanted to share some of the ways that we've been helping. We have launched more than 200 products and features and contributed more than a billion dollars in resources to help our users, our partners, everyone to get through this pandemic. I want to highlight a few of the areas we've been focused on before we broaden the conversation. You've probably heard the news about Google's collaboration with Apple. We announced this last April, and it's a combined effort using Bluetooth technology to help government and health agencies slow the spread of the virus. That partnership resulted in exposure notification apps that are used to alert you if you've been in contact with someone who's tested positive for COVID. These apps are available in more than 50 countries, and the research is showing that it's helping to save lives. \n\n0:35 - 0:54 Karen: Over the past year, we've been working closely with the World Health Organization, the CDC and others to provide trusted, authoritative information for the public. When you search on COVID-19, you'll see official information from these public health entities about symptoms, prevention, treatments, and references like maps and graphs to orient you to the state of the pandemic. \n\n0:54 - 1:03 Karen: Last September, we released the COVID-19 Search Trends: Symptoms data set, which includes search trends for more than 400 symptoms. This data has been helping researchers and public health study the link between symptom-related searches and the spread of COVID-19 , ], }, ] )

response = chat_session.send_message("INSERT_INPUT_HERE")

print(response.text) Screenshot 2024-11-06 235028

code : "import os import google.generativeai as genai

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

Create the model

generation_config = { "temperature": 1, "top_p": 0.95, "top_k": 64, "max_output_tokens": 8192, "response_mime_type": "text/plain", }

model = genai.GenerativeModel( model_name="gemini-1.5-flash", generation_config=generation_config, )

chat_session = model.start_chat( history=[ ] )

response = chat_session.send_message("INSERT_INPUT_HERE")

print(response.text)" Screenshot 2024-11-06 235602

jianchang512 / pyvideotrans

can you add speech to text using Gemini api in pyvideotrans for windows, i have given code in the discription you can check it , because it transcribes with 94% accuracy rather than openai-whisper model #606

Create the model

TODO Make these files available on the local file system

You may need to update the file paths

Create the model