vardhanam / RAG_Voice_Chatbot

A voice to voice chabot for RAG on your documents using Qdrant
1 stars 2 forks source link

Voice RAG Chatbot

This project demonstrates a Voice RAG (Retrieval-Augmented Generation) Chatbot that allows users to interact with a large language model (Mistral-7B) using voice input and receive voice responses. The chatbot retrieves relevant information from uploaded PDF documents to provide context-aware answers. Here’s a short video demonstrating the UI - https://www.youtube.com/watch?v=DAJgfzRsfBs

Features

Requirements

Installation

  1. Clone the repository:

    git clone https://github.com/vardhanam/RAG_Voice_Chatbot.git
  2. Install the required dependencies:

    pip install -r requirements.txt

Usage

Jupyter Notebook

  1. Open the Jupyter Notebook (app.ipynb) and execute the code cells.
  2. Upload one or more PDF files using the file uploader component.
  3. Click the "Submit Audio" button and speak your question into the microphone.
  4. The chatbot will transcribe your audio input, retrieve relevant context from the uploaded PDFs, generate a response, and play the audio response.
  5. To clear all inputs and outputs, click the "Clear All" button.

Flask API

  1. Start the Flask server:

    python flask_app.py
  2. The API provides the following endpoints:

    • /upload_pdfs (POST): Upload PDF files to create a knowledge base.

      • Request body: JSON object with the key file_paths containing an array of file paths.
      • Response: JSON object with a success message or an error message.
      • Example cURL command:
        curl -X POST \
        -H "Content-Type: application/json" \
        -d '{"file_paths":["path_to_file1.pdf","path_to_file2.pdf"]}' \
        http://localhost:5000/upload_pdfs
    • /process_audio (POST): Process audio input, generate a response, and return the response as audio.

      • Request body: JSON object with the keys audio_path (path to the audio file) and output_filename (desired filename for the generated audio response).
      • Response: JSON object with the keys transcription (transcribed text), response_text (generated response), and audio_output (path to the generated audio file).
      • Example cURL command:
        curl -X POST \
        -H "Content-Type: application/json" \
        -d '{"audio_path":"path_to_input_query.wav", "output_filename":"output_filename.wav"}' \
        http://localhost:5000/process_audio
  3. You can use tools like cURL or Postman to make requests to the API endpoints.

Functions

Gradio UI Components

Notes

Feel free to explore and interact with the Voice RAG Chatbot using either the Jupyter Notebook or the Flask API to experience voice-based conversational AI with retrieval-augmented generation capabilities!