mutablelogic / go-whisper

Speech-to-Text in golang
https://pkg.go.dev/github.com/mutablelogic/go-whisper
Apache License 2.0
73 stars 9 forks source link

go-whisper golang bindings #1

Open djthorpe opened 1 year ago

djthorpe commented 1 year ago

Create bindings for https://github.com/ggerganov/whisper.cpp

djthorpe commented 1 year ago

Made PR: https://github.com/ggerganov/whisper.cpp/issues/269

chrisbward commented 1 year ago

Great work!

Keen on realtime translation and a way of calling out/streaming the output to another app - gRPC seems the best option for this

djthorpe commented 1 year ago

Yeah thanks.

I'm doing the audio downsampling to 16KHz at the moment in a different repository (go-media)

The realtime transcription and translation should be pretty straightforward, but pretty experimental, even for whisper.cpp

I will take a while to get to the gPRC microservice :-(

djthorpe commented 1 year ago

Added a "stream" command for the start of real-time streaming, but:

There's also some issues with the segmenting in the main package (repeated segments come out!) needs fixing.

djthorpe commented 2 months ago

Coming back to this after some time!

Remaining tasks:

Lower priority:

djthorpe commented 2 months ago

Also:

djthorpe commented 2 months ago

Also:

djthorpe commented 2 months ago

Simplified Dockerfile and now uses the base images from here as a base:

https://github.com/mutablelogic/docker-llamacpp

This is still now working; Now I need to have the ffmpeg shared libraries included in the runtime image. Considering whether to just copy over the libraries from the build image, or to install ffmpeg libraries from source.