juanmc2005 / diart

A python package to build AI-powered real-time audio applications
https://diart.readthedocs.io
MIT License
1.1k stars 90 forks source link

Windows 10 - Exits with no errors or results #149

Open matbee-eth opened 1 year ago

matbee-eth commented 1 year ago

when I run: diart.stream speakers:9, or execute it in a python script it simply sends a notice about sox_io, and then exits. No errors. How do I figure out what's wrong?

 dev  diart.stream speakers:9
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.

If I remove this line: import diart.sources as src I get further. But, it obviously crashes once its required.

juanmc2005 commented 1 year ago

Hi @matbee-eth,

When diart receives "speakers:9", it will interpret that as a path. You need to provide either "microphone:9" or the path to an existing file.

That said, there seems to be another problem with soundfile, can you set the input to "microphone" and see if you get the same error?

matbee-eth commented 1 year ago

@juanmc2005 I tried a path to a file, and microphone and it similarly just quits quietly- although I am trying to monitor an audio-out device not an input device.

juanmc2005 commented 1 year ago

Diart's not yet compatible with output devices so I don't think that will work. Although this is something I would really like to include in future versions.

Have you installed portaudio, pysoundfile and ffmpeg ? Can you identify what line is blocking you in diart.sources ?

matbee-eth commented 1 year ago

I have all of those installed- I'm unable to identify what line is blocking me. Ive tried commenting out a plethora of code in diart.sources but the same symptom occurred- I feel its something else that my lack of python skills are of no help.

If you have any bounty system I'd love to contribute.

An alternative to diart capturing audio-out, I could use ffmpeg to capture and pipe the data. Has this been something you've seen done?

juanmc2005 commented 1 year ago

@matbee-eth you could try putting a breakpoint in the beginning of diart.sources and debug line by line until it breaks or hangs.

If you have any bounty system I'd love to contribute.

What kind of bounty system are you referring to?

An alternative to diart capturing audio-out, I could use ffmpeg to capture and pipe the data. Has this been something you've seen done?

I haven't seen or tries this but it looks like it could work! If you have a working example could you post it here? This could be an interesting alternative to implement a new audio source.

sahith2k3 commented 1 year ago

@juanmc2005 is there any progress on this bug? I tried to use this in ubuntu, it failed there due to error in sample rate. I manually changed the sample rate to 48000, then it progressed and showed me a warning that the pipeline uses 16000hz so it will resample it, but crashed later.

Hoping that it would work in windows, tried to do the same but it shows this output and stops.

(diart) C:\Windows\System32>diart.stream microphone:19 The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.

tried microphone with multiple input devices but it still wont work.

Please let me know if there are any quickfixes I can do to make it work in any of the OS. Thank you.

juanmc2005 commented 1 year ago

Hi @sahith2k3, the sample rate error is a known bug. It was fixed in #153. Can you install from that PR and see if it works? I've been trying to merge this and release v0.8 for some time but I don't have enough time with my full-time job.

juanmc2005 commented 1 year ago

Note that diart's inference time will suffer if your microphone doesn't support 16khz sampling. The dynamic resampling is quite slow. I added the possibility to resample on GPU, which might help with that, but the best thing will be to use a device supporting 16khz.