jackwuwei / gptspeaker

The ChatGPT Voice Assistant uses a Raspberry Pi (or desktop) to enable spoken conversation with OpenAI large language models. This implementation listens to speech, processes the conversation through the OpenAI service, and responds back. Like Apple Siri, Amazon Alex, Google Nest Home, Mi XiaoAi etc.
BSD 2-Clause "Simplified" License
41 stars 6 forks source link
ai chatbot chatgpt raspberry-pi smarthome speech-recognition speech-to-text tts voice-assistant

ChatGPT Voice Assistant

中文

2. Azure Cognitive Services

  1. Sign into your account at https://aka.ms/friendbot/azureportal.
  2. In the search bar at the top, enter Cognitive Services. Under Marketplace select Cognitive Services. (It may take a few seconds to populate.)
  3. Verify the correct subscription is selected. Under Resource Group select Create New. Enter a resource group name (e.g. conv-speak-rg).
  4. Select a region and a name for your instance of Azure Cognitive Services (e.g. my-conv-speak-cog-001).

    NOTE: EastUS, WestEurope, or SoutheastAsia are recommended, as those regions tend to support the greatest number of features.

  5. Click on Review + Create. After validation passes, click Create.
  6. When deployment has completed you can click Go to resource to view your Azure Cognitive Services resource.
  7. On the left side navigation bar, under Resourse Management, select Keys and Endpoint.
  8. Copy either of the two Cognitive Services keys. Save this key in a secure location for later.

    Windows 11 users: If the application is stalling when calling the text-to-speech API, make sure you have applied all current security updates (link).

OpenAI

The conversational speaker uses OpenAI's models to hold a friendly conversation. Below are the steps to create a new account and access the AI models. Supports OpenAI official API or Azure OpenAI API, just choose one.

1. OpenAI Account

  1. In a web browser, navigate to https://aka.ms/maker/openai. Click Sign up.

    NOTE: can use a Google account, Microsoft account, or email to create a new account.

  2. Complete the sign-up process (e.g., create a password, verify your email, etc.).

    NOTE: If you are new to OpenAI, please review the usage guidelines (https://beta.openai.com/docs/usage-guidelines).

  3. In the top-right corner click on your account. Click on View API keys.
  4. Click + Create new secret key. Copy the generated key and save it in a secure location for later.

    If you are curious to play with the large language models directly, check out the https://platform.openai.com/playground?mode=chat at the top of the page after logging in to https://aka.ms/maker/openai.

    2. Azure OpenAI Account

    Choose between OpenAI official account or Azure OpenAI account

    1. Create an Azure Account
      • If you don't have an Azure account, go to the Azure official website to sign up for an account. Azure offers a free account option, and new users can get a certain amount of free credits for testing and learning.
    2. Apply for Access
      • On the Azure OpenAI service page, click the "Apply for Access" button. This will take you to the application page where you need to fill in some necessary information, including your company name, use case, etc.
    3. Configure and Use
      • Once you have access, you can create a new OpenAI service resource in the Azure portal. After creation, you can get the API key and start using the Azure OpenAI service following the official documentation.

        The Code

        1. Code Configuration

    4. The Python Speech SDK package is available for Windows (x64 and x86), Mac x64 (macOS X version 10.14 or later), Mac arm64 (macOS version 11.0 or later), and Linux
    5. On the Raspberry Pi or your PC, open a command-line terminal.
    6. On Ubuntu or Debian, run the following commands for the installation of required packages:
      sudo apt-get update
      sudo apt-get install libssl-dev libasound2
    7. On Ubuntu 22.04 LTS it is also required to download and install the latest libssl1.1 package e.g. from http://security.ubuntu.com/ubuntu/pool/main/o/openssl/.
    8. Clone the repo.
      git clone https://github.com/jackwuwei/gptspeaker.git
    9. Set your API keys: Replace config.json {AzureCognitiveServices.Key}and {AzureCognitiveServices.Region} with your OpenAI API key and {OpenAI.Key} with your OpenAI API key.
      
      {
      "AzureCognitiveServices": {
      "Key": "AzureCognitiveServicesKey", 
      "Region": "AzureCognitiveServicesRegion",
      },
    "OpenAI": {
        "Key": "OpenAIKey", 
    },
    
    // Just choose one of the two OpenAI above
     "AzureOpenAI": 
     {
        "Key": "", // Key 1 or Key 2
        "api_version": "2024-02-01",
        "Endpoint": "", // Endpoint
        "Model": "" // Azure AI Studio deployment name 

    } }

    1. Install requirements
    ```bash
    pip3 -r install requirements.txt
    1. Run the code
      python3 gptspeaker.py

      2. (Optional) Create a custom wake phrase

      The code base has a default wake phrase ("Hey GPT") already, which I suggest you use first. If you want to create your own (free!) custom wake word, then follow the steps below.

  5. Create a custom keyword model using the directions here: https://aka.ms/hackster/microsoft/wakeword.
  6. Download the model, extract the .table file and copy it to source root directory.
  7. Update config.json file to include your wake phrase file in the build.
     "AzureCognitiveServices": {
        "WakePhraseModel": "xxx.table",
        "WakeWord": "xxx",
     }
  8. Rebuild and run the project to use your custom wake word.