roflcoopter / viseron

Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.
MIT License
1.75k stars 179 forks source link

[FEAT] Audio Processors #571

Open starsoccer opened 1 year ago

starsoccer commented 1 year ago

Currently viseron seems to support a lot of video/image processors for classification, I didnt see any for audio. It would be great if audio from viseron could be piped to another audio processor to do other classifications or STT(Speech to text) which could ideally then be used for voice assistants

roflcoopter commented 1 year ago

Cool idea! I am thinking you could maybe use it to start recordings when noises are heard.

What else did you have in mind when you say you could use it for voice assistants?

starsoccer commented 1 year ago

That would be a great usecase too. When I say voice assistants I mean doing simple automations/prompts. I already have written some custom code to do this myself but for instance when a person walks into a room, ask if the lights should be turned on.

The speaking part is likely outside viseron wheel house but using STT to process a yes/no would be cool

roflcoopter commented 1 year ago

Cool! Is your code accessible anywhere?

starsoccer commented 1 year ago

Yup, https://github.com/starsoccer/rtsp-to-text

Its a few months old at this point before Home assistant started the year of voice. I think at this point it would probably be much cooler if viseron could just expose microphones from cameras to HA to act as voice assistants using the assist pipeline