This project demonstrates a Voice RAG (Retrieval-Augmented Generation) Chatbot that allows users to interact with a large language model (Mistral-7B) using voice input and receive voice responses. The chatbot retrieves relevant information from uploaded PDF documents to provide context-aware answers. Here’s a short video demonstrating the UI - https://www.youtube.com/watch?v=DAJgfzRsfBs
Clone the repository:
git clone https://github.com/vardhanam/RAG_Voice_Chatbot.git
Install the required dependencies:
pip install -r requirements.txt
app.ipynb
) and execute the code cells.Start the Flask server:
python flask_app.py
The API provides the following endpoints:
/upload_pdfs
(POST): Upload PDF files to create a knowledge base.
file_paths
containing an array of file paths.curl -X POST \
-H "Content-Type: application/json" \
-d '{"file_paths":["path_to_file1.pdf","path_to_file2.pdf"]}' \
http://localhost:5000/upload_pdfs
/process_audio
(POST): Process audio input, generate a response, and return the response as audio.
audio_path
(path to the audio file) and output_filename
(desired filename for the generated audio response).transcription
(transcribed text), response_text
(generated response), and audio_output
(path to the generated audio file).curl -X POST \
-H "Content-Type: application/json" \
-d '{"audio_path":"path_to_input_query.wav", "output_filename":"output_filename.wav"}' \
http://localhost:5000/process_audio
You can use tools like cURL or Postman to make requests to the API endpoints.
load_llm()
: Loads the Mistral-7B language model for text generation.text_splitter()
: Creates a RecursiveCharacterTextSplitter for splitting long documents into chunks.add_pdfs_to_vectorstore(files)
: Processes uploaded PDF files and adds them to the Qdrant vector store.answer_query(message)
: Retrieves relevant context based on the user's question and generates a response using the language model.transcribe(audio)
: Transcribes the user's audio input into text using the Whisper ASR model.generate_and_play_audio(text, filename)
: Converts the generated response into audio using the WheelSpeech TTS model and saves it to a file.upload_files
: File uploader for uploading PDF files.success_msg
: Text component to display the success message after uploading files.audio_inp
: Audio component for capturing user's voice input.trans_out
: Textbox component to display the transcribed text.btn_audio
: Button component to trigger audio transcription.model_response
: Textbox component to display the generated response from the chatbot.audio_out
: Audio component to play the generated audio response.clear_btn
: Button component to clear all inputs and outputs.Feel free to explore and interact with the Voice RAG Chatbot using either the Jupyter Notebook or the Flask API to experience voice-based conversational AI with retrieval-augmented generation capabilities!