web worker performance issue

vladmandic / human

Human: AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition

https://vladmandic.github.io/human/demo/index.html

MIT License

2.39k stars 327 forks source link

web worker performance issue #2

Closed vladmandic closed 4 years ago

vladmandic commented 4 years ago

Human is compatible with new web workers, but...

web workers are finicky:

cannot pass HTMLImage or HTMLVideo to web worker, so need to pass canvas instead
canvases can execute transferControlToOffscreen() and then become offscreenCanvas
which can be passed to worker, but...
cannot transfer canvas that has a rendering context (basically, first time getContext() is executed on it)

which means that if we pass main canvas that will be used to render results on,
then all operations on it must be within webworker and we cannot touch it in the main thread at all.
doable, but...how to paint a video frame on it before we pass it?

so we create new offscreenCanvas that we drew video frame on and pass it's imageData
and return results from worker, but then there is an overhead of creating it and passing large messages between main thread and worker - it ends up being slower than executing in the main thread.

Human already executes everything in async/await manner and avoids synchronous operations as much as possible so it doesn't block the main thread, so not sure what is the benefit of web workers (unless main thread is generally a very busy one)?

ost12666 commented 4 years ago

What is the performance you see without web workers, how many milliseconds for a single detection the ui thread is blocked and what fps you can achieve?

vladmandic commented 4 years ago

depends on enabled modules - simple face detection, 50+ FPS, it can drop to 5-10FPS if everything is enabled. and that's on a low-end GPU.

but main thread blocking is minimal regardless of performance as everything is written in asynchronous way and you can use a promise to wait for result so main thread is free to do whatever during detection. even weights loading is asynchronous.

ost12666 commented 4 years ago

Will be happy to try the library but we must have smile detection, do you have plans for face expressions?

vladmandic commented 4 years ago

not immediately, but good idea.

just a thought: given this library has much more detailed face geometry (468 points for a base face alone, plus extras), it may be ok to extrapolate simple expressions using points math instead of using ml model.

i already do some simple stuff like:

  gestures.push(`facing ${((face.annotations['rightCheek'][0][2] > 0) || (face.annotations['leftCheek'][0][2] < 0)) ? 'right' : 'left'}`);

  const leftShoulder = pose.keypoints.find((a) => (a.score > params.minThreshold) && (a.part === 'leftShoulder'));
  const rightShoulder = pose.keypoints.find((a) => (a.score > params.minThreshold) && (a.part === 'rightShoulder'));
  gestures.push(`leaning ${(leftShoulder.position.y > rightShoulder.position.y) ? 'left' : 'right'}`);

(plenty room for optimization)

vladmandic commented 4 years ago

in either case, web workers are supported and there is demo using them in demo/demo-webworker, i'm just not sure i'd use them much personally.

ost12666 commented 4 years ago

thanks, we will take a look, maybe we can even contribute

vladmandic commented 4 years ago

i'm totally open for that!

vladmandic commented 4 years ago

@ost12666 regarding emotion detection

i've just added it and updated docs. really like the idea. it's based on a really small (200kb) tfjs model and seems to work ok for when face is front-facing the camera and not so much for side poses.

i'm closing this issue as web worker support exists and i'll maintain it moving forward.

vladmandic commented 4 years ago

revising web workers - they are actually VERY usefull - when using webgl backend and idle main thread, there is no point since every atomic function is so fast that main thread is never blocked for more than few ms. but...when using cpu or wasm backends, it kills responsiveness of the main thread and overall performance drop is huge. however, running same in the worker thread works like a charm - up to a point that if you have good cpu, it works almost as fast as with 'webgl` and main thread never suffers!

ost12666 commented 4 years ago

even few ms is too much so maybe it will help also with webgl?

vladmandic commented 4 years ago

it makes it slower by 2-3 fps (not more), but UI is 100% responsive - so it's a tradeoff.

ost12666 commented 4 years ago

How do you transfer data to the web worker?

vladmandic commented 4 years ago

      const offscreen = new OffscreenCanvas(input.width, input.width);
      const ctx = offscreen.getContext('2d');
      ctx.drawImage(input, 0, 0, input.width, input.height, 0, 0, input.width, input.width);
      const data = ctx.getImageData(0, 0, input.width, input.width);
      worker.postMessage({ data });

where input can be anything, but typically is HTMLVideoElement. without web worker, there is no need for this intermediary offscreen canvas at all since human.detect(input) accepts DOM element directly.

ost12666 commented 4 years ago

On some browsers you can use transferables for zero copy cost https://developer.mozilla.org/en-US/docs/Web/API/Transferable

vladmandic commented 4 years ago

yeah, but it's a catch#22 - you can only do transferrable on a canvas without a context - which means i can't paint it before i transfer it and thus there is nothing to detect inside the worker :(

ost12666 commented 4 years ago

I mean transfer the image data not the canvas: https://benjaminbenben.com/2013/04/14/webworker-qr/

vladmandic commented 4 years ago

Transferrable exists for ArrayBuffer, MessagePort, ImageBitmap and OffscreenCanvas. So how do I get data from HTMLVideoElement frame other than placing it on an OffscreenCanvas? Unfortunately, HTMLVideoElement does not have imageData property.

Without workers, I can ready directly from HTMLVideoElement using tf.browser.fromPixels() and avoid canvas for read operations completely - thus the performance difference.

ost12666 commented 4 years ago

From the link above: worker.postMessage(imagedata, [imagedata.data.buffer]);

I am not sure what happens to the actual imagedata

https://www.kevinhoyt.com/2018/10/31/transferable-imagedata/

vladmandic commented 4 years ago

That might cut down some processing as it's passing data by reference instead of by value. Still need intermediary OffscreenCanvas to draw on and get data from it using getImageData()

Should be an improvement, although still some overhead remains - I'll update with results soon.

vladmandic commented 4 years ago

It's better, definitely decreases latency from ~22ms to ~15ms. But it's still slower than executing in the main thread without the need to intermediary canvas or passing messages.

I also did a simple test just drawing image on canvas and passing its image buffer as reference to worker and measuring the round trip (worker does 0 actual work, just to measure the round trip) and it's about ~15ms

and that's a constant, so:

if a module executes at 60+FPS, this reduces it's performance by a half (15ms execution + 15ms latency)
if a model executes as 10FPS, it reduces it's performance by 1FPS only (100ms execution + 15ms latency)
if a model executes at 5FPS, performance reduction is negligible (200ms execution + 15ms latency)

like i said earlier - its a tradeoff - slightly lower FPS .vs. responsive UI - every user can make their own choice.

ost12666 commented 4 years ago

thanks, you are awesome, I hope you get some sleep from time to time :)

vladmandic commented 4 years ago

naah, sleeping is overrated :)

did you notice recently added emotion detection?

ost12666 commented 4 years ago

not in my age!

I noticed, thanks a lot! we are going to integrate it soon instead of face-api and provide you with feedback