Open Matthias84 opened 2 years ago
This sounds like a great idea and I'm definitely interested in implementing it for the next version. We have been doing a lot of experimentation with languages and environments, and hope to add several improvements. It is on my todo list. Thanks!
HI, I really appreciate your tool. It's such a great solution to make recordings more accessible for further investigations :smiley:
I read that Vosk has also a speaker identification / detection and I'm wondering, if you could add this to mp4grep as well? For myself there are a lot of nice usecases to track / analyse discussions (TV shows, movies, phone recordings, podcasts, web conferences, ...) and that allow great research like NLP or knowledge base and making multimedia content more accessible to users with handicaps. Done with privacy in mind and not contributing to major tech company algorithms.
My understanding so far is, that Vosk needs fingerprinting for different speakers and maybe multiple fingerprints per person. So we will need a way to assign lines within a transcription to fingerprinted speakers and to label this fingerprints with human readable labels. In a second step, there might be a final processing, that assigns this labels to every transcription line. Maybe we need also an extended transcription format like WebVTT to share this assigned lines and timecodes?