natrys / whisper.el

Speech-to-Text interface for Emacs using OpenAI's whisper model and whisper.cpp as inference engine.
140 stars 10 forks source link

[feature] paragraph breakdown #6

Closed oatmealm closed 10 months ago

oatmealm commented 1 year ago

I'm wondering if whisper itself is able to break the text into paragraphs, i.e. when speaker changes, long pause etc. I'm wondering how otter and airgram are doing this, even though it's far from perfect. Is it at all possible? Is this what a "punctuation" model for? I'm just guessing, since I'm not really familiar with the underlying technology.

oatmealm commented 1 year ago

Ok I'm reading this and I can see it's probably a workflow issue that can be solved with chaptgpt for example.

https://github.com/openai/whisper/discussions/552

natrys commented 1 year ago

Yeah that's one of my pain points too. I don't know if there is much we could do, or if it would even be in the scope of whisper.el. The chatgpt stuff is interesting if it works, though some people seems to have very mixed results.

Generally speaking, it would architecturally be a good idea to define couple of hooks that are run at certain points, say before and after whisper-run or whisper-file is called. Users could bring their own functions to do necessary pre/post processing. I haven't bothered because I am not sure what sort of code people would realistically want to run.

If this kind of transformation is possible to do, I guess we could explore those architectural options. And then even if we don't have wild stuff like using chatgpt merged in main whisper.el code, they could be put in the wiki or documentation.

oatmealm commented 1 year ago

Maybe he doesn't need to be more that. Real down at least on some value set by the user? Just to make the text manageable?

natrys commented 1 year ago

Do you mean adding line breaks every N sentences (where you could set N to whatever)?

oatmealm commented 1 year ago

Do you mean adding line breaks every N sentences (where you could set N to whatever)?

Yes. For me it makes sense since it makes working with the text some what easier...

natrys commented 10 months ago

Arbitrary text formatting or transformation is now possible to do in a post-processing hook.

You exact use case (paragraph break after every N sentences) is shown in the hooks section of readme.