omlins / JustSayIt.jl

Software and high-level API for offline, low latency and secure translation of human speech to computer commands or text on Linux, MacOS and Windows
BSD 3-Clause "New" or "Revised" License
85 stars 10 forks source link

Add possibility to filter background noise from input audio #24

Open omlins opened 2 years ago

damian-666 commented 1 year ago

Without a time domain analysis which isn't possible without a calibration and real-time DSP w/ complex mic arrays,

The simplest and best you can do is get a studio vocal mic (i'm using a Shure sv100 ,not ideal) 30-60$ and a preamp <200. I an old presonus 2 channel preamp . The classic performers favorite, Shure sm58, Sm 48 or similar, and a stand.. its a dynamic nearfield cardioid microphone. You can use a small stand, but you have to speak very close to the mic or put your hand behind the mic to really isolate your vocals, a few feet away it picks up almost nothing. They are durable you can have long wire, drop it, so you don't need to sit if you have a big screen or projector. I'm sure there are wearable mics or comparable quality to look at used by performers who sing and run around the stage. Save your back.

This will eliminate reflections, room effects, or music you are playing, its just too hard to do electronically. You will get a consistent result for training, i'm running fans on and off and noise levels are variable.

There are tons of options, i just went with what I could find and vocalists use, I'm just reporting what i tried.

The I plug pc audio output to the preamp, the preamps output to my speakers, little studio monitors, and i can mix everything, and hear what the computer hears. Also good for karaoke . and hearing yourself , or noise , or feedback, live as you speak during zoom meetings or whatever. laptop build in audio and generally computer specific audio isn't great.

The sv 100 is what i use its not ideal for vocals. SM 48 is. other brands might be fine. I just could not use any voice recognition before i started to use a good microphone, mixer and monitors.. total is about 4 to 500.

M-Audio M-Track Duo – USB Audio Interface for Recording, Streaming and Podcasting with Dual XLR, Line & DI Inputs, plus a Software Suite Included this one is cheap I use a presonus . they are all 2 channel i use one. .. I tried a condenser microphone and its distorts too much from the voice, and requires the 48 volts. Remember to have teh 48V button OFF if you use the dynamic mic like the Shure or it will pop move the diagram and can break it. A condenser mic is a tiny cheap low dynamic range cylinder. The dynamic mics have large moving diagrams.

in Computer Recording Audio Interfaces by M-Audio $69.00 haven't tried it..but cheaper than $ 200 presonus..

this is a very cheap 15$ alternative: , its not loud is the chief complaint. not sure if it will work. depends on your audio card input. Shinco Handheld Wired Microphone, Cardioid Dynamic Vocal Mic with 13ft Cable and ON/Off Switch, Ideally Suited for Speakers, Karaoke Singing Machine, Amp, Mixer [Visit the Shinco Store]

This software has such great potential I just wish I could control my Pc OS and tools and start menu, and apps and command query against the Apis of the apps and my own modules, using maybe Netcore application via the reflection interface, and the the current softwares exposed object model , comments, and commands, or query the file ssytem by voice. Netcore can generalize between linux and windows and linux and mac, say for querying the file system, environment. not sure how to integrate with julia but there are some efforts on it.

if any projects are under way for hands free command / and control ill sponsor both, i count not install ths last time and just don't have time. MS voice access is improving but too generalized. liek a mouse / or keyboard overlayer, i want to cut to the api and reduce the domain for better guess and direct one shot command /query. Ive begged MSFT to do something like this but UI is alwasy a opinionated issue.. This would be indisputably ideal. Not even one button, often no need to even look at the screen.

thx. if you think this is feasable i can push hard for sponsors and find the people to pitch to, im coding too long and 4 back surgeries, and still mousing and typing. while AI is coding for me, it cannot do the 90s NLP prolog stuff, or DSL or dynamically reflect on APIs it was not trained on , and they changed a lot since 2021. I can split this into discussion if you think its appropriate to use this technology for this, and no conflicts of interest..

damian-666 commented 6 months ago

Any tests of vocal mics would be welcome.. a dynamic mic is a natural voice band pass, big diaphragm ,and singing mics it's heavy. So background on non vocal don't seem to go to the offline voice access that msft uses in win 11 previews, I'm testing with that but it's not open source as far as I know . But it might use open sources.

I looked at and evaluated Pico hotwords. To me hotwords are key. But they don't have a quick hot word to , "text to intent " Rhino transition , a delay was reported and they didn't want to fix it. And it's a paid service. I find the faster you talk the less data they have to process and or send, especially with Google assistant online. And I hope to save time when this advanced stuff is fully integrated.

The clip on ,walk around type for speech could work. Condenser mic with a simple low pass might be ok but will add a phase shift . they tend to pic up high frequency noise and that's hard for the voice to text filters. Using digital convolution filters in addition, seems asking for trouble.

But Using hotword low latency DSP on device is key or adding devices even buttons ,taping or switching mic, so that so one can interrupt process or completion or offline work. For , No cancel, Don't sent it , etc.

But any have decent vocal mic should be ok. Or a list of recommended ones. It makes a night day difference vs my laptops mic array.. on mst voice access anyways. Just starting to believe this can be generalized and replace mouse keyboard , hopefully by LLM , preprocess ,OCR, scrape, index , synonym, completion choice hotword using international NATO labels and hotword,as in , alpha bravo, charlie, so preprocess then offline ..

I hope to see completion in context , via synonym that are auto generated by llms, but when applied can be a dimensional raising convexifixation issue, or treated as a dirichlet constraint . It seems simple but with llms it's not. I'll move this bit to another issue, I'm overextended.

when I learn more about what's projects are going or if they can be generalized because they should be IMO. Msft Semantic kernel is being steered many ways..now I'm steering it even another way but everyone is being heard ,concensus is way more likely than with visual UI. You discover features, see it, say it.

Thx I'll try to find time to eval more .btw I think I got them to agree do the NATo coms, but I can't bet it will be fully implemented as I imagine . Right now always I rage quit the one they have and it's pretty amazing, billions invested, ,just not usable , but could be .

I'll compare w this when I get Julia working I can't build anything..