leetcode-mafia / cheetah

Mac app for crushing remote tech interviews with AI
Creative Commons Zero v1.0 Universal
4.08k stars 296 forks source link

Transcribed audio not showing #35

Open iamlokeshvunnam opened 1 year ago

iamlokeshvunnam commented 1 year ago

Thanks a lot for this repo!

I cloned the Cheetah repo, and cloned the whisper.cpp repo (both at the same level). I followed the instructions to download the ggml-model. I brew installed sdl2 and also installed blackhole, and when I run the Cheetah project, I am not able to see the transcribed text on the window.

When running the project for the first time, it couldn't find the model at this location ( '/Users//Library/Caches/org.phrack.Cheetah/ggml-medium.en.bin'), so I used the downloaded ggml-model which was inside whisper.cpp/models/ and put it in the above file location for Cheetah to be able to find the model.

Screenshot 2023-07-12 at 00 20 13 Screenshot 2023-07-12 at 00 21 02

What am I missing? Do I need to be in a meeting with someone to test this out? What should the source of the input be? Please clarify these! Thanks a lot!

leetcode-mafia commented 1 year ago

It looks like you might be running in debug mode and/or with a debugger attached? If so, that won't work because Whisper runs too slowly without compiler optimizations.

If that is not the issue, then the BLANK_AUDIO tokens suggest that the app is receiving audio but you have no input devices sending audio to BlackHole. Before trying to get BlackHole to work, you'll want to make sure Cheetah works with the built-in mic input.

iamlokeshvunnam commented 1 year ago

I'm not really sure if I've the debug mode on, because as soon as the Xcodeproj opened up, I clicked on the 'play button' which I'm thinking should run the application in the release mode, or not?

Please help, thanks!

leetcode-mafia commented 1 year ago

No, that will run it in debug mode. You're looking for: Product > Build for > Profiling, then Product > Show Build Folder.

iamlokeshvunnam commented 1 year ago

Oh, thanks a lot. I've now built the application, and running it in the release mode. I still don't see any transcriptions (input source is set to 'MacBook Air Microphone'). Any reason why?

iamlokeshvunnam commented 1 year ago

Yep, I found the reason. The model that was put in the cache folder is corrputed somehow, replacing it with the ggml model that exists in the whisper.cpp did the trick. It works, however, I still have to try and make the blackhole to work (I'd appreciate your help on this one).

Thanks a lot. Also, the transcriptions seem a bit slow, what would your advice be to have faster transcriptions? Have you tried using the quantized models, do they produce reasonably good transcriptions? Any other suggestions?

leetcode-mafia commented 1 year ago

What hardware are you using? It can only run fast enough on an M1 or M2.

Even with beefy hardware, there's still a minor delay in generating transcriptions. The main reason is that the current algorithm for buffering/chunking the audio stream isn't optimal and needs further tuning.

M1/M2 is fast enough to run the medium model in real-time, so I don't think using quantized or smaller models would make a significant difference.