Open alanmilinovic opened 1 year ago
Would be maybe as integration for existing service that needs API key. Or maybe there are open-source solutions for this.
Would be maybe as integration for existing service that needs API key. Or maybe there are open-source solutions for this.
This is a great project!
@m1k1o did you find some time to check link I provided in my last comment? Would it be something potentially recognized as a PR?
Yes. Sure, as a plug-in. Do you want to create a PR? I can help you with it. Just create working PoC and I can clean it up and integrate.
Sounds good, will do my best!
Can you give me some guidance? When you say plugin, I guess it should be added in client part?
I would say, it needs to be in server as well. Plugin part does not need to confuse you, it can be hardcoced for PoC. I meant, that it should be self contained piece of code that can be turned on/off if needed.
Starting with getting the audio from neko. I could be either:
<video>
element or direcly from WebRTC track.supervisord.conf
, it would use system output from pulseaudio.The last option should be the easisest. You want to see if that software:
And then having this feature would be only matter of starting it and stopping it. Displaying text in GUI can be done in second step, first we only want to get the text in any from.
Hope this helps!
So far I managed to add a Whisper to the google-chrome server image, where I am testing it.
There is a way to use Whisper in command line or python. I am just not sure how to get pulseaudio output. There are a lot of examples on the Whisper github but it is to complicated for me.
Thats good start!
I only found examples with files. Can it work with live sources? Process like a pipeline?
I see ffmpeg is being used under the hood. That could capture pulseaudio.
This is what I am reading at the moment, might help maybe.
Thats good start!
I only found examples with files. Can it work with live sources? Process like a pipeline?
I see ffmpeg is being used under the hood. That could capture pulseaudio.
Do you know how to get input device name of pulseaudio? I am not getting list inside container, when trying to list them.
I will leave the issue opened, maybe someone else can jump in. From what I learned it should be all possible, but I have weak knowledge to finish the coding. There are multiple ways for sure and it all looks feasible.
For gstreamer we use auto_null.monitor
, should work also for ffmpeg, i'd say.
auto_null.monitor
I am getting auto_null.monitor: No such process
if I run ffmpeg -f pulse -i auto_null.monitor out.mp3
inside container.
Maybe because pulseaudio is not running or I am doing it in wrong place?
I also get this message No PulseAudio daemon running, or not running as session daemon.
when running simple pacmd
command.
It would be cool to have live transcription feature, similar to Microsoft Teams. You could have live lyrics for example when playing music via youtube in browser and have fun with your friends.