Hark: The Ultimate Real-Time Speech-to-Text-to-LLM* 🚀
Hark is your new favorite gadget for turning live audio into text, all while mingling with OpenAI’s GPT-4 for some extra brainpower! Whether you're capturing epic meetings or casual chats, Hark’s got you covered with its slick features and nerdy charm.
🌟 Key Features
- Real-Time Speech-to-Text-to-LLM: Watch in awe as live audio transforms into text instantaneously thanks to cutting-edge speech recognition.
- Multi-Language Support: Speak in your language of choice! Hark supports a ton of languages for flawless transcriptions.
- Interactive GPT-4 Integration: Chat with OpenAI’s GPT-4 for smart answers and insights that go beyond mere transcription.
- Meeting Summarization: Get concise summaries of your meetings that highlight all the important bits without the fluff.
- User-Friendly Interface: Big, friendly buttons for starting, stopping, and clearing recordings—perfect for all levels of tech wizardry.
🚀 Getting Started
Gear up and get ready to roll! Make sure you have Node.js and Yarn installed on your machine.
Installation
-
Clone the repo:
git clone https://github.com/us/hark.git
cd hark
-
Install dependencies:
npm install yarn
yarn
🎧 Usage Guide
Audio Input Setup
macOS (OS X)
To capture system audio on macOS, grab BlackHole—a nifty virtual audio driver.
-
Install BlackHole: Download and install BlackHole.
-
Create a Multi-Output Device:
- Open Audio MIDI Setup.
- Hit the + button and choose "Create Multi-Output Device."
- Add your speakers and BlackHole to this device.
- Set it as your system audio output.
-
Set BlackHole as Input:
- In Hark, select BlackHole from the audio input device dropdown.
Windows
To achieve a similar setup on Windows, use Voicemeeter.
-
Install Voicemeeter: Download and install Voicemeeter.
-
Configure Voicemeeter:
- Open Voicemeeter.
- Set Hardware Input 1 as your default microphone and send it only to
B
.
- Also, send the virtual input to both
A
and B
(with A
for hearing through your default speakers and B
for virtual output).
- Set Hardware Out A1 as your default output, typically your system speakers.
- Double-check the Windows sound settings in the system tray to ensure Voicemeeter hasn’t changed your default speaker output. (Keep your sound output as your default device, not voicemeeter!)
-
Configure Audio in Google Meet and Hark:
- In Google Meet, set the input as your default mic and output as Voicemeeter Input.
- In Hark, choose B1 as the input device.
Run the Application
Fire up your local server with:
yarn dev
Then check out your app at http://localhost:3000.
🔧 Configuration
đź“ś How It Works
- Select Audio Device: Choose BlackHole to capture system sound.
- Start Recording: Hit "Start Recording" to capture and transcribe audio in real-time.
- Language Selection: Pick your preferred language from the dropdown.
- Ask GPT-4: Use "Answer the Latest Question" to get smart responses from GPT-4.
- Summarize Meeting: Click "What’s This Meeting About?" for a quick summary of your discussion.
- Stop Recording: End the session with "Stop Recording."
- Clear Results: Hit "Clear" to reset and prep for the next session.
đź”® Future Features
- Whisper Integration: Planning to add Whisper API for even more accurate transcriptions. Note: It's heavy and slow, so our current system is still quicker.
- More Languages: Expanding language options to cover even more tongues.
- React UI Overhaul: A fresh, React-based UI to make the interface even more user-friendly.
- Local Speech-to-Text Models: Offline capabilities so you’re never left hanging.
- Expanded Model Support: Additional AI models for broader interaction possibilities.
🔍 Final Checklist Before Using Hark
Before you dive into using Hark, make sure you've completed these steps for a seamless experience:
- Audio Routing: Ensure that audio routing is correctly set up with BlackHole (or a similar virtual audio driver). BlackHole captures system audio, allowing Hark to process sound from other applications.
- Input Device Configuration: Verify that BlackHole is selected as the input device within Hark. This ensures the app captures all system sounds accurately.
- API Key Setup: Enter your OpenAI API key in
app.js
to enable GPT-4 interactions.
- Model Selection: Choose the appropriate GPT-4 model for your needs.
- Application Testing: Start listening with Hark, and test by asking questions to ensure everything works as expected.
By following these steps, you ensure that Hark is fully functional and ready to provide a smooth, real-time transcription and interaction experience.
🤝 Contributing
Got ideas or want to help out? We’re all ears! Submit a pull request or open an issue to join the fun.
How this will help you:
- More feedback to fix and improve your project.
- New ideas about your project.
- Greater fame.
“Sharing knowledge is the most fundamental act of friendship. Because it is a way you can give something without losing something.”
— Richard Stallman
đź“ś License
This project is licensed under GLWTPL (GOOD LUCK WITH THAT PUBLIC LICENSE)
⚠️ Disclaimer
It is built for educational purposes only. If you choose to use it otherwise, the developers will not be held responsible. Please, do not use it with evil intent.
📬 Contact
Questions, suggestions, or just want to chat? Hit us up at rahmetsaritekin@gmail.com.