juanmc2005 / diart

A python package to build AI-powered real-time audio applications
https://diart.readthedocs.io
MIT License
1.09k stars 91 forks source link

Running Diart_Whisper on Windows and nothing happens #203

Closed ScottSump closed 4 months ago

ScottSump commented 1 year ago

Hello, I've been trying to get your colored text demo working but nothing seems to happen. I've gotten the basic demo working from this repo and it works fine, but whenever I run the code it just stays as this:

C:\Users\Scott\anaconda3\envs\Dialog\Lib\site-packages\pyannote\audio\core\io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call. torchaudio.set_audio_backend("soundfile") C:\Users\Scott\anaconda3\envs\Dialog\Lib\site-packages\torch_audiomentations\utils\io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call. torchaudio.set_audio_backend("soundfile") The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.1. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\Scott\.cache\torch\pyannote\models--pyannote--segmentation\snapshots\2ffce0501d0aecad81b43a06d538186e292d0070\pytorch_model.bin Model was trained with pyannote.audio 0.0.1, yours is 3.0.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.1.0. Bad things might happen unless you revert torch to 1.x. Lightning automatically upgraded your loaded checkpoint from v1.2.7 to v2.1.1. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\Scott\.cache\torch\pyannote\models--pyannote--embedding\snapshots\c6335d8f1cd77b30084387468a6cf26fea90009b\pytorch_model.bin Model was trained with pyannote.audio 0.0.1, yours is 3.0.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.8.1+cu102, yours is 2.1.0. Bad things might happen unless you revert torch to 1.x. Backend TkAgg is interactive backend. Turning interactive mode on. Listening...

Would any of those warnings be causing the issue? It appears to be loading the model into VRAM.

I'm also going to go ahead and apologize in advance if it was something obvious/dumb.

Edit: I also changed OnlineSpeakerDiarization to SpeakerDiarization and PipelineConfig to SpeakerDiarizationConfig as you recommended in a different thread.

juanmc2005 commented 1 year ago

Hi @ScottSump, what do you mean the "colored text demo"? Can you send a link to the gist/branch/script/PR you're trying to run?

Also, does the script crash or does it hang at "Listening..."? I would recommend that you debug the code line by line so we can better understand what's going on. Having only linux myself, I'm unable to reproduce this issue.

For example, you could very well be receiving transcriptions but for some reason they're not showing, or you could be getting no transcriptions at all. I am also assuming your audio contains speech and has a sample rate of 16kHz.

ScottSump commented 1 year ago

Hi Juan, thanks so much for answering. It's the code from your gist here: https://gist.github.com/juanmc2005/ed6413e697e176cb36a149d8c40a3a5b

Doing some stuff with the kids this weekend so can't get back to it until Monday, but think I'm going to keep testing it bit by bit to see where the problem is. The audio is coming from a mic and I'm pretty sure it's at 16kHz, but will verify that too. I'm also not familiar with rx.operators, so will have to read up on them so I can better understand the program flow.

It hangs on listening..., no crash, the vram is still high and GPU usage seems to spike every once in a while, so it looks like it's trying to do something. I got pyannote working on it's own, so since that works, and diart works, I'm guessing the issue is either somewhere in program flow (something just not getting triggered) or an issue with the whisper portion.

Thanks so much for the quick response. If you think of anything else, please let me know. I'll post what I figure out here as I progress.