Welcome to the CommanderGPT repository! This project harnesses the power of OpenAI's GPT-3.5 language model to enable seamless automation of your desktop tasks using voice commands. With a simple voice instruction, you can effortlessly control your desktop environment and accomplish a wide range of automation tasks.
Easy Voice Control: Command your desktop by speaking naturally. Use the provided hotword "commander" to activate the script, and effortlessly issue voice commands for various actions.
Bash Script Generation: The script leverages OpenAI's GPT-3.5 model to generate precise Bash scripts based on your voice commands. These scripts act as the bridge between your voice instructions and desktop automation.
Versatile Automation: Open applications, navigate through menus, simulate keyboard and mouse inputs, perform web searches, write code, save documents, and execute them — all through intuitive voice commands.
Interactive and Voice Modes: Switch between interactive mode, where you can enter commands directly, and voice mode, which allows for a more natural and hands-free interaction.
Enhanced Desktop Integration: The script intelligently switches between windows and ensures the appropriate actions are performed in the correct desktop environment.
Threat Intelligence: If the command will likely cause harm to your computer, It will ask you If you would like to proceed. You can adjust Threat barrier level.
Cross Platform: Works on Linux & Mac & Windows.
To utilize this CommanderGPT, ensure the following dependencies are installed.
Linux:
python3-devel
python-pyaudio
espeak
Mac:
Windows:
You will also need an OpenAI API key.
Clone this repository to your local machine.
Install the required dependencies by running the following command:
pip install -r requirements.txt
Rename config.yml.example
to config.yml
and Update the file with your OpenAI API key. You can customize other parameters if needed.
Execute the script using the following command:
python main.py
The script will actively listen for voice commands. Alternatively, you can switch to interactive mode by pressing the executing script with --interactive True
arg and typing commands manually.
Use the hotword "commander" to activate the script and provide voice commands. For example, say "commander, open the web browser" to launch the web browser.
The script utilizes OpenAI's GPT-3.5 model to generate Bash scripts based on your voice commands. These scripts will be executed, automating the desired tasks on your desktop environment.
To exit the script, either type "quit" or "exit," or say the hotword followed by "quit" or "exit" (e.g., "commander, exit").
Here are a few examples of voice commands you can use:
Example 1: Open Firefox, open a new tab, and search for a Python tutorial.
Example 2: Launch the Cheese App and take a picture.
Example 3: Open LibreOffice Writer, compose a Shakespearean poem, and save the document.
Feel free to experiment with different commands and explore the limitless possibilities of CommanderGPT!
https://github.com/theonlyfoxy/CommanderGPT/assets/12250394/8bed8d6f-46bb-4444-87a9-f015bcb9fbb4
Click the above to watch a video example showcasing the CommanderGPT in action.
Feedback from Desktop: Improve the script to capture and process feedback from the desktop environment, enhancing the accuracy and reliability of voice commands and automation tasks. (PyAutoGUI LocateonScreen, Teseract OCR & Current Mouse Position Relative to Button, Change X/Y Cordinates Based on Screenshot)
Additional Features: Explore and implement additional features to expand the capabilities of the CommanderGPT system.
This code is licensed under the MIT License.