hysaordis / JarvisAi

5 stars 1 forks source link

Text in, text out #1

Open joelmnz opened 1 week ago

joelmnz commented 1 week ago

Hi @hysaordis, you have done an great job converting Dan's concept to c#, thanks.

I would like to propose a change (configurable) to allow users to input text and receive text responses.

Windows has a very good speech recognition via Windows + H and I would like to use this rather than a transcription service.

Also, browsers like Chrome have built in text to speech, it's a bit tinny sounding but it's ok for fast local (free) tts.

If I hack something like that up would you be interested in merging it into this branch?

hysaordis commented 1 week ago

Hi @joelmnz , thank you very much for the recognition! I'm glad to hear that my work on bringing Dan's concept to C# has been useful to you and the community.

Regarding your suggestion, I’d like to point out that the README isn’t fully updated with the latest changes. Recently, I’ve made some significant updates, including configurable transcription services available in appsettings (such as Whisper, AssemblyAI, and AssemblyAI RealTime). Whisper is entirely offline, providing a local transcription option.

Additionally, I transformed the console client into a Windows Service with a Kestrel web server. I’ve also developed a new client with React and Rust, which has a polished UI inspired by Siri—it opens at the top of the screen and provides only two commands: start and stop, accessible with a single click. The interface shows different states like Idle, Listening, Processing, and Executing. I’ll be updating the build instructions soon, as I’m excited about how it turned out!

As for your idea of a text-to-text feature, I could integrate it into the client as an expandable panel under the main component to enable text-based interaction, which sounds like a great addition. I could work on this in the coming week.

One thing that would be helpful: if you know of any free TTS services that we could add, especially an offline solution, that would be ideal. For the LLM, I’m currently using Ollama with LLaMA 3.2.

Let me know what you think about these updates, and thank you again for your contribution!

joelmnz commented 1 week ago

@hysaordis, I also use Ollama and other local service but sometimes you just need a text in text out interface, if thats possible the sweet!

As for TTS, yeah you can use Chrome APIs that could be enabled when running your web interface with Chrome

Check out: https://developer.chrome.com/docs/extensions/reference/api/tts

I also think you can do Speech to text in the browser as well: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

The React Rust combo sounds interesting.

I guess I'm looking for an easy in to a local AI agent that's extendable to my workflow, if my suggestion doesn't fit with this projects thats all good, much appreciate your work, looks really well done.

Keen to know if you are using Aider or Cursor etc (Ai dev tools) as I see your own agent as the next tool in a devs tool belt.

Cheers