Closed suyashgautam closed 4 days ago
The latest changes significantly enhance the "Transcription" project, renaming it to "Real-Time Transcription Service." Key updates include new configuration files, client and server enhancements for real-time audio transcription using WebSocket, buffering strategies, voice activity detection, and speech recognition. Additionally, comprehensive documentation, setup instructions, and testing procedures have been included to support an enriched user and developer experience.
File(s) | Change Summary |
---|---|
Transcription/.env, Dockerfile, requirements.txt | Added new environment variables, Docker configurations, and updated dependencies for the project. |
Transcription/.gitignore | Introduced to exclude .idea directory and .env files. |
Transcription/.idea/... | Several files added for PyCharm project configuration. |
Transcription/Dockerfile | Set up a container environment with NVIDIA CUDA, Python, and necessary libraries. |
client/VoiceStreamAI_Client.html, utils.js | Added real-time audio transcription client interface and WebSocket communication utilities. |
docker-compose.yaml | Configuration for ui and backend services, including build contexts and port mappings. |
src/asr/... | Introduced ASR factory and interface, and specific implementations for Whisper and FasterWhisper ASR. |
src/audio_utils.py | Added utility functions for saving audio data to file. |
src/buffering_strategy/... | Introduced buffering strategies, including a factory and interface for audio chunk processing. |
src/client.py, room.py, server.py | Implemented client, room, and server classes for handling WebSocket connections, audio processing, and VAD. |
src/database.py | Manage interactions with SQLite and Minio for audio and transcription file storage. |
src/main.py | Main server functionality for real-time audio transcription. |
src/transcription_utils.py | Utility functions for managing and processing transcription files. |
src/vad/... | Implemented VAD factory and specific VAD class using Pyannote library. |
start.sh | Script to check for environment variables and start the application. |
test/vad/test_pyannote_vad.py | Unit tests for PyannoteVAD functionality. |
README.md | Comprehensive project documentation, including features, setup, and usage instructions. |
sequenceDiagram
participant Client as Client
participant WebSocket as WebSocket Server
participant VAD as Voice Activity Detection
participant ASR as Automatic Speech Recognition
participant DB as Database
Client->>WebSocket: Connect
WebSocket->>Client: Connection Acknowledgement
Client->>WebSocket: Start Streaming Audio
loop Real-Time Processing
WebSocket->>VAD: Send Audio Data
VAD->>WebSocket: Send Detected Segments
WebSocket->>ASR: Send Valid Audio Segments
ASR->>WebSocket: Send Transcriptions
end
WebSocket->>DB: Save Transcriptions
WebSocket->>Client: Update Transcriptions
Client-->>WebSocket: Stop Streaming Audio
WebSocket->>Client: End Session
In a world where words take flight,
Real-time whispers, day and night,
Transcribe and save each spoken bit,
With WebSockets and streaming wit.
Voices captured, stored with care,
Minio holds them, floating in air.
For every change, we sing with glee,
A rabbit's joy, code running free! 🐰✨
[!TIP]
Early access features: enabled
We are currently testing the following features in early access: - **OpenAI `gpt-4o` model for code reviews and chat**: OpenAI claims that this model is better at understanding and generating code than the previous models. We seek your feedback over the next few weeks before making it generally available. Note: - You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file. - Please join our [Discord Community](https://discord.com/invite/GsXnASn26c) to provide feedback and report issues. - OSS projects are always opted into early access features.
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
@coderabbit review
Next Steps
High Priority
Low Priority
Summary by CodeRabbit