omlins / JustSayIt.jl

Software and high-level API for offline, low latency and secure translation of human speech to computer commands or text on Linux, MacOS and Windows
BSD 3-Clause "New" or "Revised" License
84 stars 10 forks source link

Where to add custom @voiceargs #25

Open ecotechie opened 2 years ago

ecotechie commented 2 years ago

Hi, this is great! Never used Julia, but getting the hang of at least running this :smile:

I'm wondering if I missed the part where you mention where user-defined commands get added. I assume editing the files in the src directory is not a good practice.

omlins commented 2 years ago

Hi @ecotechie , good point, this was not yet documented! I have added now an example and corresponding documentation, see here: https://github.com/omlins/JustSayIt.jl#fast-command-programming-with-voice-argument-functions

Please let me know if this clarifies it! Don't hesitate to open other issues...

ecotechie commented 2 years ago

Thanks for adding that helpful info. I'm wondering if there is a good place in init_jsi.jl to add a ispath(~/.local/JustSayIt/config_custom_function.jl) for a user-defined config file and if true, include() that file? That way it would always be loaded on start and we wouldn't need to manually include the files. Maye I'm missing the point of the API :smile:

omlins commented 2 years ago

Sure, it makes sense to have something like that. I will add that, see #28 .

Maybe I'm missing the point of the API

My original idea of the API I had was to enable to create VUIs - Voice User Interfaces ;) - for applications with a very easy high-level API. However, the more direct application is definitively the one you have in mind: creating custom command functions to run JustSayIt with.

If you create some cool command functions, it would be nice if you could contribute them. :)

ecotechie commented 2 years ago

Thanks for the explanation and absolutely! I'd be happy to contribute whatever I add as custom commands. I'm thinking a robust set of keyboard shortcuts and then maybe some Vim ones too.

omlins commented 2 years ago

@ecotechie , there is now a function include_config to include custom functions or entire JustSayIt run scripts from the JustSayIt application config folder (e.g., ~/.config/JustSayIt on Unix systems). More information is obtainable by typing ?include_config. This solution keeps user code outside the JustSayIt module, avoiding different issues...

omlins commented 2 years ago

37 ...

omlins commented 2 years ago

@ecotechie , you can use this now:

julia> using JustSayIt

help?> @include_config
  @include_config(path::AbstractString)

  Prefix path with the JustSayIt application config path and then call
  include(path). If path is an absolut path, then call include with path
  unmodified.

  │ NOTE: JustSayIt application config
  │
  │  The content of the JustSayIt application config folder is not
  │  evaluated within JustSayIt. The folder's single purpose is to
  │  provide an easily accessible storage for scripts to start JustSayIt
  │  and/or for custom command functions: @include_config permits to
  │  conveniently include files from this folder (for details about the
  │  Julia built-in include type ?include). Your JustSayIt application
  │  config path on this system is: /home/omlins/.config/JustSayIt

shell> cat /home/omlins/.config/JustSayIt/test.jl
f() = 3

julia> @include_config("test.jl")
f (generic function with 1 method)

julia> f()
3
ecotechie commented 2 years ago

Thanks for all of this great work! I think I got it working as expected (I think). Is there a way to autoload this custom config file if it exists? Not any kind of a deal-breaker, I'm just not that familiar with how Julia works. My ideal situation is that I easily start JustSayIt and my custom commands are also loaded. Though I'm happy to work in the command line too :smile:

Still looking at the code to make sense of it. #SelfTaughtDev

omlins commented 2 years ago

@ecotechie , my idea was that the user defines his personal mapping of command name to functions/shortcuts, where functions can be taken from the submodules Keyboard, Mouse, Internet etc. or written customly. Now, with the @include_config macro you could easily load one of multiple of your favourite configs to start JustSayIt, If you prefer, you can start this all with a single command directly from a terminal, see here:

> ls /home/omlins/.config/JustSayIt/
start_weather.jl  test.jl

> julia -e 'using JustSayIt; @include_config("start_weather.jl")'
[ Info: Initializing JustSayIt (press CTRL+c to terminate JustSayIt)...
[ Info: No 'default' model found in its default location: downloading small english model (vosk-model-small-en-us-0.15.zip) from 'https://alphacephei.com/vosk/models' (~40 MB)...
Download (39.297 MiB): 100%|████████████████████████████| Time: 0:00:06
[ Info: No 'type' model found in its default location: download accurate large english model (vosk-model-en-us-daanzu-20200905.zip) from 'https://alphacephei.com/vosk/models' (~1 GB)?
Type "yes" or "no":
yes
Download (922.865 MiB): 100%|███████████████████████████| Time: 0:01:48
[ Info: Listening for commands...
[ Info: Starting command: help (latency: 66 ms)
┌ Info: 
│ Your commands:
│ help    => help
└ weather => weather
[ Info: Starting command: help (latency: 18 ms)
┌ Info: Command weather
│    =
│      weather `today` | `tomorrow`
│    
└      Find out how the weather is today or tomorrow.
[ Info: Starting command: weather (latency: 19 ms)
^C[ Info: Terminating JustSayIt...
[ Info: Stopped listening for commands.

Note that "start_weather.jl" is exactly this file here (just renamed): https://github.com/omlins/JustSayIt.jl/blob/main/config_examples/config_custom_function.jl

omlins commented 2 years ago

If you don't want to replace the default commands, but just add to them, you can merge yours and the default in your JustSayIt start config as in this example:

julia> commands
Dict{String, Function} with 2 entries:
  "help"    => help
  "weather" => weather

julia> JustSayIt.DEFAULT_COMMANDS
Dict{String, Function} with 11 entries:
  "ma"       => click_left
  "middle"   => click_middle
  "email"    => email
  "right"    => click_right
  "select"   => press_left
  "help"     => help
  "okay"     => release_left
  "triple"   => click_triple
  "double"   => click_double
  "internet" => internet
  "type"     => type

julia> commands = merge(commands, JustSayIt.DEFAULT_COMMANDS)
Dict{String, Function} with 12 entries:
  "ma"       => click_left
  "middle"   => click_middle
  "email"    => email
  "right"    => click_right
  "select"   => press_left
  "help"     => help
  "okay"     => release_left
  "triple"   => click_triple
  "double"   => click_double
  "internet" => internet
  "weather"  => weather
  "type"     => type

Do you think some of this should be more automatic?

omlins commented 2 years ago

@ecotechie : note the change in default behaviour from max speed to max accuracy. See PR #45 :

This PR removes package-wide the kwargs use_max_accuracy and max_accuracy_subset and replaces them with the inverse kwargs use_speed_accuracy and max_speed_subset

damian-666 commented 11 months ago

couple things for the ultimate hands free UI i spec'd it out to some Microsoft as a series of challenging tweets , found out they are working on it a preview for insiders (im not one) called Windows Copilot , and Cortana is retired , ,FINALLY. so they should unify Start menu they ruined, and have only a 0 button UI with and indexed api surface map , a input line aft top left for apps and bottom right for OS.

a quick thing you could do is look at the Silk project of Microsoft and see their APi scrapers how they work to generate the Domain Specific Mappings. and the types and parameters maybe ot automate the building of that part. I can sponsor but now i m too busy to do this.. im still learning which chatbot codes best.. and how it can use context. but not with voice yet.. none have that ...

Part of the reason ww2 planes crashed is because each model had a very different User Interface and that's why computers are so painful to use. So this also via synonyms and homologies linux and windows users the ability to use some other computer or switch back and forth.

but What they did well (99% to the finish line but not quite) with Developer Studio is add a Code /Feature But the great feature that indexes all the bloated UI and combines the discoverability of Menus with Random Access of command line which i can never remember, and i cant use someone else's computer, the KB shortcut get reset etc. So side combine code /feature search , make that an app start menu, scrape the API surface, ot the IDE, ( methods in SILK for many apis) , com and Automation, scriptable apps are easiest but they can scrape c and c++ externs too.

when you start typing say text size it will show some UI . with voice assistant if there is multiple matches it will put a little tear drop icon marker with a 1, 2, 3, So you can say 1 or 2.. or Click 2.. is gotta be exact its terrible.. i suggested Alpha Bravo Charlie or something so that i can speak quicky and interject Alpha as soon as i get a visual its more clear as im using a hotword,as 1, 2, 3 are too common in code or values.

" Set line spacing to 1 " for example should just do that..as it brings the UI, and visually show the IU doing it.. it might talk back say what its doing , and rarely ( in the case of uncertainty or data loss possibility) ask for confirmation but its is not undo about just logg what it did.. i might not even be looking ..

the code feature search are separate. ( separate so already a huge barrier to use it should figure it out for the context, and put guess symbols on the UI as you go) its not hands free.

i must . collect the words and types, then use that to build the dictionary to help the voice to text limit the domain for better guessing, then if they say computer launch devstudio and i even start to say ( now devstudio is active so i don't need to sue it hotword) / it might use open files current project, or whatever to set a search strategy.

File Open Recent, and this index work visually as i say file it might already put an Alpha on that recent file as soon as i say File, if i say File Alpah it will load last workspace, if t day File Open Alpha it wil have shown a file open box, it will start with a line of markers.. down the list .if i say File Open Liz, any projects or files in there called Lizard, Lazi will get marks. if im not looking ill have to say the file name until tis pretty certain then it will say "opening Solution Lizard" or something.. The ill say Build and debug..

image

windows voice assistant is better but its built over the old user , its doesnt work at all. no synonmics, have to say things exactly.. its unusable.

So in LLMS there is a way to reduce teh domain i forget what that they call it.. I don't know if they will screw this up.. i've not seen domain specific LLMs yet. some are using the API / codebase you are working with or your history. Voice rec isn't.

, near all windows applications have a COM based automation object model, and a plugin that can access them so someone can build this but i though microsoft should do it for start , adn make an sdk so someone can use that to scape APIs , geet those tokens and get those synonyms , and generate a query language to report the valued too.. I can say is my "Just my code set to ON, Ture, or Checked"? it should bring up the IU, maybe or use put in the textbox, Yes, App.Options.Debug.JustMyCode = true.

so you could also be able to say to the OS, (hotword" computer, or synonyms or other", linux, Windows, HAL, "Mother" like in Alien File ) , FileManager , or Explorer since its called Explorer.exe in the commands folder, order all my files by size ( Dictionary would have set up by scanning for all apps and commands , cmd.exe, chkdsk.exe , then generating the lists from that.. then to a SQL there and text to SQL parts for Query. The llms are code at context right now htan reduction within context might be happening, but I don't know if that happing with the voice recognition yet... Regardless,, it might have high latency , and not be good..

i can show you my tweet rant if there something unclear that sounds maybe agreeable but i think this would ultimate UI that fits all users .. hands free, or the type the least amount with completion.. I told them just leave the existing UI mess there, put this over it, save on all those endless UI meetings. and user options. I don't ever want to customize my UI really if i don't have to.. I usually build a domain spectify tool..

i hope this helps.. they might be doing it.. i have tried JustSayIt it but i don't have ubuntu so i didn't get very far.. seems pretty fast.. mabye if you tiy it to exsitng LLMs or voice recs or booth and scrape the apis.. I dont know its will come out in a few months i think..its current state iim told is a mess.. If Voice Assistant is any indication they will build a later over the ui instead ofthe AI, not as i advised. Im left handed and even that is painful.. left click right click ,etc..

hope this helps , imo hoping to command query a AI driven PC looking at a 4k monitor or dong generative CAD design while tidying my room , holding a karaoki cardioid mike or a headset on. let me know if you don't think this is the idea everyone would like, it seem to me just like talking an app i see some of.. later ill just talk to it without looking..

damian-666 commented 11 months ago

https://github.com/dotnet/Silk.NET/blob/main/src/Windowing/Silk.NET.GLFW/Enums/GamepadButton.cs ( example of hand coded one) for IL language theres is spy, https://clang.llvm.org/ i think they use this in Silk. and the proposal maybe not be implement for the fulll envisioned scraper

when talkng about graphics like open Gl, or vulcan or directx they use a generator and commit the apis and metadata..periodically .. existing code completion modules mabye vst kind of schemes, might be able to feed back to the voice module .

but to run a gUI or OS, it can look at the apps plugin surface APIs... object model.. could be a crowdsources once its setup.. its a lot of work.. i can set what MSFT is actually doing but I dont have hiig confidence..ill testing copilot vs exsitng apis, its good with api its was trained on 2 years ago was my last experiecne an it didnt compile unless it was an dead or slow project, but for active open source, and reflecting on NUgets, and havign to build libs cost al lot of tokens .. will need to reflect or scape or spy..

https://github.com/dotnet/Silk.NET/blob/7a042d118ac7b954a617f8b275e3d7320e9784be/documentation/proposals/Proposal%20-%20Generation%20of%20Library%20Sources%20and%20PInvoke%20Mechanisms.md?plain=1#L30

one of the bots might use a vistor preparser or api spy https://lablab.ai/tech/anthropic/claude not sure which one..trying several..