omegaui / linux-voice-control

Your personal, fully customization, Linux Voice Control Assistant.
GNU General Public License v3.0
129 stars 7 forks source link
ai desktop-voice-control linux voice-control

Watch the CLI version in action on youtube

Linux Voice Control

See it in action

Latest Preview (No-Audio)

See YouTube Video

Features

Pro-Tip: Say 'See you later' to it turn off.

Pro-Tip: Say 'Activate master control mode' to turn on master control mode without manual config.

Pro-Tip: Say 'Deactivate master control mode' to turn off master control mode without manual config.

Yes, these are the built-in actions!

Master Control Mode

Ok! ;)

Let's just see how to set up master control mode:

After installing, run the master_control_mode_setup.py script

python3 master_control_mode_setup.py

You will be asked to speak three times given only 3 seconds each time. Speak as much as you can but in your normal tone! After Saving the training-data, this program exists without actually enabling master control mode.

For enabling master control mode, you need to set this property in lvc-config.json to true OR Even you can just say activate master control mode during the runtime to enable it dynamically for that session only 😉!

{
  "master-mode": true
}

Usage / Install

Lets quick finish the setup

git clone https://github.com/omegaui/linux-voice-control
cd linux-voice-control
./install.sh

The install.sh script will set up your installation of linux-voice-control by installing some dependencies with pip.

The above process will finish off writing lvc-config.json and lvc-commands.json file to your root (~) and also the sources to ~/lvc-bin. That's how your lvc-config.json look initially ...

{
  "name": "alex",
  "greeting": "Greetings!",
  "record-duration": 3,
  "channels": 2,
  "rate": 44100,
  "chunk-size": 1024,
  "notifications-enabled": false,
  "show-commands-on-startup": true,
  "logs": true,
  "speech-threshold": 2500,
  "live-mode": false,
  "use-hot-word-in-basic-mode": false,
  "hot-words": [
    "hey alex",
    "here alex",
    "listen alex"
  ],
  "master-mode": false,
  "master-mode-barrier-speech-enabled": true,
  "master-mode-barrier-speech": "Unauthorized",
  "voice-pitch": 1.0,
  "voice-feedback-enabled": true,
  "voice-transcription-feedback-enabled": false,
  "voice-feedback-speed": 1.35,
  "voice-cache-enabled": true,
  "voice-feedback-default-speeches": [],
  "voice-feedback-transcription-capable-speeches": [
    "transcribing...",
    "getting it..."
  ],
  "voice-feedback-turning-off": "Turning off linux voice control!"
}
lvc-config.json

Let's take a look at the game controller ... lvc-commands.json here, blocking property means that whether the feedback must go on simultaneously with command execution (if set to false) or it should first complete the voice feedback then execute the command (if set to true).

{
  "open firefox": {
    "exec": "firefox",
    "feedback": "starting firefox",
    "blocking": true
  },
  "open editor": {
    "exec": "gedit",
    "feedback": "launching editor",
    "blocking": true
  },
  "lock the screen": {
    "exec": "xdg-screensaver lock",
    "feedback": "locked",
    "blocking": false
  },
  "open whatsapp": {
    "exec": "firefox https://web.whatsapp.com",
    "feedback": "opening whatsapp",
    "blocking": true
  },
  "open instagram": {
    "exec": "firefox https://instagram.com",
    "feedback": "opening instagram",
    "blocking": true
  },
  "write an email": {
    "exec": "firefox https://mail.google.com/mail/u/0/#inbox",
    "feedback": "opening G Mail",
    "blocking": true
  }
}
lvc-commands.json

You must be able to infer from above by now, that the keys are your speeches actually and their values are the corresponding commands that are to be executed each time you name those keys.

But wait ... understand how it's actually working in the background.

Let's take an example how it accurately recognizes your commands and your actual speeches.

First if you're really feeling happy to know about this project, then, you must thank openai's whisper it's actually the root and the stem of this project being a reality. Want to know why? Just click on the link above.

After getting the audio transcription, a fuzzy match is applied against the keys in the lvc-commands.json.

Just like you search something on Google and getting the correct results even after typing the wrong spelling.

Your installation is all set for usage (for forking too 😉). Just hit linux-voice-control in the terminal.

Life Saver Tip: Add linux-voice-control-gui to your startup scripts

Future Ideas: Adding Dynamic Voice Control like hey alex ... call Spider-Man (you already know things like that) and most important things discussions

Build from source

Super simple things you know ...

git clone https://github.com/omegaui/linux-voice-control
cd linux-voice-control
pip install -r requirements.txt
python3 main.py

If you are unable to build pyaudio, see this post.

If you are encountering errors with MPV:

If you are encountering warnings with ALSA saying unable to open slave or Unknown {device}:

pcm.default cards.pcm.default pcm.sysdefault cards.pcm.default pcm.front cards.pcm.front

pcm.rear cards.pcm.rear

pcm.center_lfe cards.pcm.center_lfe

pcm.side cards.pcm.side

pcm.surround21 cards.pcm.surround21

pcm.surround40 cards.pcm.surround40

pcm.surround41 cards.pcm.surround41

pcm.surround50 cards.pcm.surround50

pcm.surround51 cards.pcm.surround51

pcm.surround71 cards.pcm.surround71

pcm.iec958 cards.pcm.iec958 pcm.spdif iec958 pcm.hdmi cards.pcm.hdmi pcm.dmix cards.pcm.dmix pcm.dsnoop cards.pcm.dsnoop pcm.modem cards.pcm.modem pcm.phoneline cards.pcm.phoneline


Then, everything would work fine.

# ![](https://img.icons8.com/color/32/000000/approval--v1.png) Ok Tested!
![img.png](images/img.png)
<div align="center">
    <em>
        The project is finely tested on <strong>fedora 36 Workstation & Ubuntu 22.04</strong>
    </em>
</div>

# ![](https://img.icons8.com/external-flat-geotatah/32/000000/external-expand-startups-flat-flat-geotatah.png) Extending Usage

The values in `lvc-commands.json` seem just like single commands running when matching key is triggered.

But wait ... it doesn't ends here.

You can even do more with this, i.e. you can write a shell script or any other program, that can be used to
perform some other complex task.

### Example
```json
{
  "its time to rock": {
    "exec": "/usr/bin/setup-spotify-chill-mode"
  }
}
A new entry in lvc-commands.json
#!/bin/bash
echo "Listen I don't know actually how to do this ... but I think you got what I mean ..."
# some code here that launches spotify and starts playing that playlist
setup-spotify-chill-mode.sh

🐧 Happy Hacking 🐧