moadel321 / voice-assistant

This project implements a voice assistant that can transcribe speech to text using a Hugging Face inference endpoint and generate responses using OpenAI's GPT-4.
1 stars 0 forks source link

Voice Assistant Project

This project implements an advanced voice assistant that can transcribe speech to text using a Hugging Face inference endpoint, generate responses using OpenAI's GPT-4, and convert text back to speech using OpenAI's TTS API.

Features

Requirements

Setup

  1. Clone this repository
  2. Install required packages: pip install -r requirements.txt
  3. Set up your environment variables:
    • OPENAI_API_KEY: Your OpenAI API key
    • HUGGINGFACE_API_TOKEN: Your Hugging Face API token
    • ELEVENLABS_API_KEY: Your ElevenLabs API key (if using ElevenLabs for TTS)
  4. Ensure you have FFmpeg installed and available in your system PATH

Usage

  1. Run the script: python voice_assist_poc.py
  2. Press and hold the spacebar to speak
  3. Release the spacebar to process the audio and get a response
  4. The assistant will transcribe your speech, generate a response, and speak it back to you
  5. Press 'q' to quit the program

Performance Logging

The script logs detailed performance metrics for each turn of the conversation, including:

These metrics are saved to a CSV file for further analysis.

Customization

Troubleshooting

Future Improvements

Contributing

Contributions to improve the voice assistant are welcome. Please fork the repository and submit a pull request with your changes.