tinylibs / tinylet

🎨 redlet(), greenlet(), bluelet(), and more threading helpers for web Workers
MIT License
34 stars 0 forks source link

[Disussion] Benchmarks! #5

Open jcbhmr opened 1 year ago

jcbhmr commented 1 year ago

This is a discussion thread to discuss WHY the benchmarks are the way they are and how to improve on them

https://github.com/tinylibs/tinylet/issues/4#issuecomment-1626187830

jcbhmr commented 1 year ago

@jimmywarting re your thread from https://github.com/tinylibs/tinylet/issues/4#issuecomment-1626187830

Yeah, I'm getting similarish results.

https://github.com/tinylibs/tinylet/blob/72fe63f9b4f9f8f0044e187e5810025f2ac73ef1/test/async-to-sync-comparison.bench.js#L33-L60

image

I think the reason that the synckit is so much faster is because it's not transferring the data: URL each time. It's ONLY transferring the arguments array. That's it.

const msg: MainToWorkerMessage<Parameters<T>> = { sharedBuffer, id, args }
//                                              👆 obj is 1      👆 id is 3
//                                                    👆 sab ptr is 2 👆 args are N

👆 That's only N+3 "things" that need to get serialized/transfered each call. Compare that to:

port.postMessage([lockBuffer, executorURL, this, [...arguments]]);
//               👆 array is 1  👆 str is M length, needs to be copied
//                                           👆 this is usually 1 (undefined)
//                                                     👆 arguments are N

👆 This is N+M+4. I think that might be why it's slower than synckit?

jimmywarting commented 1 year ago

I think the reason that the synckit is so much faster is because it's not transferring the data: URL each time

O_o

In my own test i just mostly only bench-testing the functions execution time. not the time it takes to load up a new worker. So my bench test only calls this function once:

  const url = "data:text/javascript," + encodeURIComponent(code)
  const { default: fn } = await import(url)

therefore the data url is only transfered once.

jimmywarting commented 1 year ago

my assumption to why synckit is faster is b/c it cheats and uses receiveMessageOnPort it dose not use any (de)serialize methods to transfer the data from the worker to the main thread via SharedArrayBuffer

it uses postMessages instead - which is a no go for other env solutions.

jcbhmr commented 1 year ago

When I remove the 200 bytes of data: URL that was getting transferred each time, it reduced the time enough that now tinylet/redlet() is the fastest!

image

I'm currently using a very crude caching system. I need to make it a bit more robust to failure so that having something throw doesn't mean game over 😅

child worker doing the recieving https://github.com/tinylibs/tinylet/blob/e3d21f4be850777ebff3eb72d55d8595710f2536/src/redlet-node.js#L35-L41

parent caller outside worker https://github.com/tinylibs/tinylet/blob/e3d21f4be850777ebff3eb72d55d8595710f2536/src/redlet-node.js#L103-L108

jcbhmr commented 1 year ago

my assumption to why synckit is faster is b/c it cheats and uses receiveMessageOnPort it dose not use any (de)serialize methods to transfer the data from the worker to the main thread via SharedArrayBuffer

it uses postMessages instead - which is a no go for other env solutions.

You may be right. I think that having a specialized export for Node.js that uses recieveMessageOnPort() to get 🏎🏎 speed and then a normal browser-compatible Deno-compatible version is the ideal end-game. Both exposed as the same entry point so that you don't need to care about the implementation, it just auto-routes it to the best option for your platform using export conditions

jimmywarting commented 1 year ago

using postMessage and recieveMessageOnPort have some advantages... it can transfer all structural clonable objects. including things such as ReadableStreams, ArrayBufferViews, Dates, Error Objects, Blob / Files, Regex and everything. Blob & files can easily just be references points instead of fully cloned content

and while you are at it you could also use the transferable option to instead of cloning a typed array you would then instead transfer it. so using postMessage(data, [ transferList ]) have some advantages... but the con is that deno, bun, browser dose not have recieveMessageOnPort

jcbhmr commented 1 year ago

@jimmywarting you're 100% right that looping over a 6kb string to JSON.stringify() it is very costly 🤣 image 😭😭😭

But the good news is that that can be fixed via:

  1. Having an additional SharedArrayBuffer return value (int probably) that indicates whether the result from the user's function is a number, string, buffer, typedarray, JSON-encoded object/array
  2. To support similar levels of objects as in Node.js, use a custom JSON.stringify() replacer (or even a v8.serialize() polyfill if that exists) to serialize ES6 Map, Set, Request, Response, Headers, etc. https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm
  3. Not JSON-encoding if it's one of the "fast types" (number, boolean, string, buffer, typedarray)
  4. Not decoding it if it's one of the "fast types"
  5. Using a custom deserializer that is compatible with all structuredClone() types to deserialize it

I'd need to test which of these is faster for ser/deser

JSON.stringify(thing, (key, value) =>
  needsStructuredCloneSerialize(value)
    ? structuredCloneSerialize(value)
    : value)
// vs
structuredCloneSerialize(thing)

👆 cause honestly idk.

jcbhmr commented 1 year ago

Is this just me or is Deno just really really slow?

Deno: image Node.js: image

jcbhmr commented 1 year ago

Tried converting Deno benchmarks to the native https://deno.land/manual@v1.35.0/tools/benchmarker Deno.bench() and still terrible results... 😭😭😭

image

jcbhmr commented 1 year ago

@jimmywarting This is very interesting. Deno has a not-so-great postMessage() serialization and transfer procedure. This means that your trick of doing everything in a SharedArrayBuffer polling loop is orders of magnitude faster! Awesome trick! 👍

https://github.com/denoland/deno/issues/11561

image