Open JpEncausse opened 4 years ago
Your question is less about TensorFlow and more about how javascript and browsers execute work. The browser will render choppy to the screen if the UI thread is blocked. This is the default thread. It is used for anything where the DOM interface is used on the object referenced. In this case, because you're processing a video stream, let's assume the processing of that video and handing off of that video to face-api is all being processed on the main UI thread. To optimize this, you could try only having the UI thread capture the video and convert it into data face-api could process. You could then run both on a worker thread which would not block the CPU thread. You end up possibly hitting a second issue though, which is that CPU bound processing is not the only issue. On machines with weak, old or integrated graphics cards, you can also become GPU bound. The UI thread also requires access to the GPU, and TensorFlow is not very polite in the way it splits work out to the GPU. Both cases result in choppy rendering which has nothing to do with the audio of a system. If your background process is dominating your CPU threads, that can also starve your browser of resources it needs to run the UI thread quickly.
What you can do which will work, however, is to simplify the face detection model and framework used, or to offload this processing to the server where the language used to process it is a faster more low level threaded language. In the former camp, you lose features like landmarks and facial deltas and descriptors. In the latter, you can only scale by how much of your own processing power you want to pay for, and the bandwidth is the same as streaming video costs. Neither are ideal cases, but both work pretty well.
https://github.com/auduno/headtrackr <-- this is a very very old project which used bleeding edge apis at the time and works fast, but has a limited feature set. It's incredible how fast you can get things to run when you lose the heft of tensorflow and big ML models.
Thanks for the explanation and the repo, very interresting I'll dig into it.
I found some basic bug/optimisation on my code to narrow the CPU usage. I'll clean it to better use Worker.
I found something weird, in Face-API.js the samples get a frame from the camera then setTimeout(). So I did the same thing with a threashold for the timeout (I saw other library doing also that)
BUT, when I started to play with start/stop vs clearTimeout it seems this was scramble
May be it's my code, can we assume setTimeout / clearTimeout on webcam frame is sync (because JS engine is single threaded ?) I'll try to simplified my code to check that
Using setTimeout to defer to the next tick/event loop should really be considered a bad practice by now. Promises are way faster and nicer on the system than timeouts. They don’t run the same way, but produce the same result of yielding so other work can be done. It allows the javascript engine to decide how it wants to schedule the work and there isnt that nasty minumum 10ms limit between cycles like on setTimeout so all your broken apart work will actually finish faster. It really shows up when you’re running large batches of work
Btw, if you do look at that headtrackr repo, fair warning that it requires you to do minor tweaks to the getUserMedia calls to make it work with the current standard. Remember, it was using draft specs and bleeding edge apis back when it came out and it was an abandoned project so it hasnt been fixed or advanced. Just an example of how fast and light things can be when optimized fully for constrained hardware.
I wanted to get into easing and predictive tracking and pupils for glancing. I really dont care about facial features (ears, jaw, mouth, eyebrows, emotions) as much as i just want z depth, facial fingerprint, and view angle. If i know where your eyes are and how far away they are and where they’re looking and can animate field of view and depth of field and stuff in 3d, the head coupled perspective thing would get super cool for building some simple 3d games like world runners.
Hello, I'm running FaceAPI on a raw webcam connected to a ChromeBit (so Chrome on ChromeOS). On the other hand I also capture raw audio using Script Processor to perform Speech Recognition (server side)
It works very well to capture face, landmarks, etc ... so I let it run 1FPS to detect if there is someone in front of the camera.
Each second the FaceAPI run the detection against the current frame. That create a CPU peak with side effects on the audio recording (look like some tremble / echo)
My understanding is that even if JS is single thread, there is some side effect working on audio and video that could mess things up.
Do you have some knowledge about tensorflow JS doing that kind of side effect ? I tried to move part of the audio processing in a Web Worker but there is still some issues