How do I implement the look at a person to amplify that person's volume?

vb000 / LookOnceToHear

A novel human-interaction method for real-time speech extraction on headphones.

Other

528 stars 54 forks source link

How do I implement the look at a person to amplify that person's volume? #3

Open andytriboletti opened 3 months ago

andytriboletti commented 3 months ago

I see a Python script I can run on my computer, I haven't tried it yet, but I think I could connect a microphone and process real-time audio and output it in real time, but I don’t know how to detect the user looking at someone. Could you tell me how that works? I found this project from: https://www.washington.edu/news/2024/05/23/ai-headphones-noise-cancelling-target-speech-hearing/

Thank you.

vb000 commented 3 months ago

Hi @andytriboletti, user makes a gesture to the device (like pressing a button) when they want to 'enroll' a speaker they want to hear, while looking at them.

andytriboletti commented 3 months ago

Right, but how do I tell the python script that the user pushed a button?

vb000 commented 3 months ago

@andytriboletti, if you're referring to the script in the top-level directory slurm.py, it is for training the model on a cluster. You could find the find the scripts for running in the model in this dir: https://github.com/vb000/LookOnceToHear/tree/main/src. Specifically this: https://github.com/vb000/LookOnceToHear/blob/main/src/ts_hear_test.py. Just a clarification, this repo only contains training and evaluation code for the neural network model we developed for our proof-of-the-concept system. This is not code for the system design itself.

animeesh commented 3 months ago

so does that mean we cannot test it on our local machine? I assume the inference file is still missing or not yet added for the user to test. It would be super cool if we could take the model infer it on our audio and test it in our environment.

CallMeHFK commented 3 months ago

so does that mean we cannot test it on our local machine? I assume the inference file is still missing or not yet added for the user to test. It would be super cool if we could take the model infer it on our audio and test it in our environment.

Same question for me

deyituo commented 2 months ago

@vb000 Could you add a simple python script for infer: input mixture/enroll wav, output result wav??