Open Swimburger opened 8 months ago
I noticed that the IAudioToTextService.GetTextContentsAsync
method returns multiple TextContent
's.
We have APIs to return the transcript as sentences and another as paragraphs.
Would it make sense to add options to AssemblyAIAudioToTextExecutionSettings
, which would control whether the transcript is returned as a single TextContent
, or a TextContent
for each sentence, or a TextContent
for each paragraph?
I would add to todo also full realtime transcribing, so you send AudioContent or AudioStreamContent and you get IAsyncEnumerable<StreamingTextContent>
I would add to todo also full realtime transcribing, so you send AudioContent or AudioStreamContent and you get
IAsyncEnumerable<StreamingTextContent>
I want to add realtime, but I want to finalize and release non-realtime transcription first.
Our realtime solution uses a WebSocket connection, expects raw audio bytes to be sent continuously, and responds with partial and final transcript objects. This is mostly consistent with other realtime transcription services. I'd be happy to work with y'all in figuring out how to create a good abstraction that'll work for us and other realtime services.
Instead of using the AudioStreamContent
, I'm introducing an AssemblyAI file service for users to upload their files to AssemblyAI. #5964
In the future, we can use a streaming audio content class for Streaming STT.
Now that we have the AssemblyAIAudioToTextService
and AssemblyAIFileService
in, I think we can release the initial version of this connector. What would the next steps be?
This PR uses the AssemblyAI SDK: https://github.com/microsoft/semantic-kernel/pull/8556
@RogerBarreto With the SDK PR merged, is it ready to be released?
Ping!
Pong!
Catch up with Roger via discord, maybe he knows when team could review your code.
You should also take a look at new abstraction microsoft.extensions.ai and introduce it directly in your sdk assemblyAi. This will be a new way of implementing connectors.
Motivation and Context
AssemblyAI is a speech AI company offering AI models through APIs. Adding a connector will help users integrate AssemblyAI easily with Semantic Kernel.
Description
Progress of implementation of AssemblyAI connector. Current implementation ASSEMBLYAI BRANCH
TODO
TextContent.InnerContent
AssemblyAIAudioToTextExecutionSettings
Potential additions