olivierkes / manuskript

A open-source tool for writers
http://www.theologeek.ch/manuskript
GNU General Public License v3.0
1.7k stars 226 forks source link

speech to text #1233

Open Reaper10 opened 9 months ago

Reaper10 commented 9 months ago

A speech to text engine mite be go for mobile app or desktop. For the text to speaker ginger and maybe a good idea to add away for the user to speak. Speak the formatting code that's in mark down and be able to auto, put the right. Formatting codes around whatever they want.

TheJackiMonster commented 9 months ago

This could be used for English at least.

obw commented 9 months ago

I think we should focus first that all the functionality, we have, runs good and fast!

The other thing is, is speech to text not something, which should run over the OS!

Also, this is also a new source for more issues / requests and bugs! (Alone the Question like: Why is Language XYZ not supported, could be time-consuming!)

I use a driver which extends the keyboard driver under Linux, will be Open Source in some Months, written by a Friend! Should also run under MacOs and Windows... I also need soon a little better graphic card, I have 1GB RAM, but soon I will need at least 4GB, better 8GB...

I use it with a macro to take notes, for writing it's not good enough, could be the Problem that I write everything, what not IT related in German! Also, I don't like my Stories when I dictate them, I had a partner some years ago, who loved my Stories so much, that someday I had a microrecorder in my Pocket... I wrote to slow, for this special reader, but was fast with typing... but the quality of the dictated stuff, was to much work, to make it work!

I also train a Voice for TTS, to generate my own Audiobooks, good Models for English is easy to come by... But again, I don't write in English!

Reaper10 commented 9 months ago

This could be used for English at least.

@TheJackiMonster will it be built im?

TheJackiMonster commented 9 months ago

This could be used for English at least.

@TheJackiMonster will it be built im?

Maybe that would make sense in a future when it supports more open models for other languages. The biggest benefit of the library I mentioned is that it comes with close to zero dependencies and it runs on a CPU without issues. So the required specs stay quite low. Also it runs in real time by compressing audio to have head room for trans coding. It's a really impressive implementation.

Anyway if you want to test it as tool. The author also implemented a graphical user interface for it which can be installed as flatpak on any Linux distribution. It's meant for generating captions of video or audio live. Works pretty well in my testing.

However to get text for writing you probably want to test the example application from the libraries repository. It runs in the command line which allows copying the text into your documents. Most downside is that you don't get proper punctuation and case sensitivity. But I would assume some grammar correction tools could work with the output it generates.

In general I agree with @obw that we have other priorities right now. Additionally it's not as simple as putting the library in Manuskript and everything works. It has downsides. But I wanted to link it here because it's quite impressive technically and would definitely make such a feature possible to some degrees.

But I also agree that dictated stories still require a lot of manual work. People do not talk as they write or read. So that should be kept in mind with such an implementation. I would argue it might be helpful but it's not a replacement to usual writing. So it makes sense to improve performance and reliability of writing first.

TheShadowOfHassen commented 8 months ago

I think this is an issue that can be adressed in a new plugin system, not by official manuskript. I do like the idea though, call me strange, but I'd love a system like JARVIS from the Iron Man movies to help me start a story.

obw commented 8 months ago

@TheShadowOfHassen : I love to have a defined plugin system for Manuskript!

Jarvis, as concept is cool, but as I had an AI let say in one of my Stories:

"Leben liebt es Energie zu sparen und bei Homo Sapiens verbraucht denken überraschen viel Energie!"

"Life loves to save energy and in Homo Sapiens, thinking uses a surprising amount of energy!"

It was the Answer for what is the biggest thread for humankind! This is also one point for me not to use Tools like Chat-GTP!

TheShadowOfHassen commented 8 months ago

@obw it wouldn't be like Chat-GPT. My thought would be something that can quickly switch between characters, world building, and outline, so I can just get it out as quickly as possible.

I would hate having a Chat-GPT plugin for Manuskript, I wouldn't use it, but I'd hate it for the same reasons you say.

TheJackiMonster commented 8 months ago

I had some random contact with the author of the software I mentioned above and that pointed me towards another project for voice to text conversion. It supports a wide range of languages which is much more desirable for software like Manuskript. But it does not apply in real time. However it might be able to use as import function for voice recordings. The text results can be very good.