Open andytriboletti opened 3 months ago
Hi @andytriboletti, user makes a gesture to the device (like pressing a button) when they want to 'enroll' a speaker they want to hear, while looking at them.
Right, but how do I tell the python script that the user pushed a button?
@andytriboletti, if you're referring to the script in the top-level directory slurm.py
, it is for training the model on a cluster. You could find the find the scripts for running in the model in this dir: https://github.com/vb000/LookOnceToHear/tree/main/src. Specifically this: https://github.com/vb000/LookOnceToHear/blob/main/src/ts_hear_test.py. Just a clarification, this repo only contains training and evaluation code for the neural network model we developed for our proof-of-the-concept system. This is not code for the system design itself.
so does that mean we cannot test it on our local machine? I assume the inference file is still missing or not yet added for the user to test. It would be super cool if we could take the model infer it on our audio and test it in our environment.
so does that mean we cannot test it on our local machine? I assume the inference file is still missing or not yet added for the user to test. It would be super cool if we could take the model infer it on our audio and test it in our environment.
Same question for me
@vb000 Could you add a simple python script for infer: input mixture/enroll wav, output result wav??
I see a Python script I can run on my computer, I haven't tried it yet, but I think I could connect a microphone and process real-time audio and output it in real time, but I don’t know how to detect the user looking at someone. Could you tell me how that works? I found this project from: https://www.washington.edu/news/2024/05/23/ai-headphones-noise-cancelling-target-speech-hearing/
Thank you.