owickstrom / komposition

The video editor built for screencasters
https://owickstrom.github.io/komposition/
Mozilla Public License 2.0
428 stars 22 forks source link

Feature: speech recognition for visual feedback on audio #94

Open robinp opened 4 years ago

robinp commented 4 years ago

Could wire up speech recognition on the audio chunks to:

Bonus: add keyword / topic extraction.

owickstrom commented 4 years ago

Yes, this is something I've been wanting to add. But I haven't found any good tools to integration with yet (maybe haven't looked hard enough...)

robinp commented 4 years ago

You mean speech recognition? I had good experience with CMU PocketSphinx, even wrote some c2hs bindings. If you are interested, I can dig them up!

owickstrom commented 4 years ago

Sorry for dropping the ball on this one. PocketSphinx seems interesting, and it's available on macOS, Linux, and Windows, it seems. Do you have the bindings published somewhere?

robinp commented 4 years ago

Just pushed after some dusting: https://github.com/TreeTide/voicetrans/tree/master/sphinx . Partial bindings, but good enough to run the recognition (see app/Main.hs to get model file and test input).