taoensso / sente

Realtime web comms library for Clojure/Script
https://www.taoensso.com/sente
Eclipse Public License 1.0
1.74k stars 193 forks source link

Memoized packing for pushing identical messages to large groups #276

Closed mhuebert closed 1 year ago

mhuebert commented 7 years ago

Is there any reason the transit packer doesn't include memoization of payloads?

The recommended way to broadcast a message to a group of subscribers is to loop through a list of users:

  (doseq [uid (pubsub/subscribers :my-topic)]
      (chsk-send! uid [:my-app/action {:text "This Is Not A Tweet", :payload some-data}]))

But if we have 1,000 subscribers, the message data is serialized for transport independently 1,000 times. As it's relatively easy to have a data payload that takes 1ms to pack, these milliseconds add up.

In a quick test, I added memoize to this line, and that did the trick. To minimize memory usage, I imagine encore/memoize* with a small cache size might be a good choice?

Ideas/feedback welcome, maybe I'm missing something.

ptaoussanis commented 7 years ago

Hey Matt, thanks for pinging about this!

You've spotted a definite inefficiency (just something I hadn't considered), and a good fix (with encore/memoize*).

One potential minor difficulty with using memoize* is that its caching/gc parameters aren't easily tuneable when used in a library.

In this case, think it might actually make sense to consider adding a new API fn for bulk broadcasting that'll just generate the payload once before looping.

Just have my hands quite full atm, but will think about it a little and get back to you.

Again, this was a good (and valuable) catch - and a nice clear report: much appreciated!

Cheers :-)

mhuebert commented 7 years ago

Hey, thanks for the quick reply + commit!

I figured this would touch on other concerns, like buffering. Thanks for the informative comments in the commit.

A bit off topic, but I'm not sure a separate issue is warranted or if this is covered elsewhere - are there tests that aren't committed into the repo? I see a testing profile in profile.clj, but I don't seem to see anything in the test directory. While experimenting locally I was wondering how you approached testing for performance w/ many connected clients.

theasp commented 7 years ago

I haven't looked at your changes, but I believe that Sente keeps the transit readers and writers for each client since #161. Is it still safe to memoize this? I think the memoized result would not be valid for any client other than the one it's writer is used for, based on how transit replaces data with shorter tokens.

ptaoussanis commented 7 years ago

@theasp Thanks for pinging about this Andrew.

The readers and writer are indeed cached, but I don't see how additionally caching particular reads or writes would be a problem?

Last I checked the implementation, Transit can be stateful for the duration of a particular read/write, but shouldn't automatically (to my knowledge) be stateful between separate reads/writes.

For example, it uses token replacement to help deduplicate content within a particular payload. Unless you're specifically requesting something like it though (and providing the necessary data), I don't see how/why it'd try provide deduplication between payloads.

Does that make sense?

I don't use Transit myself though, so may be missing something obvious?

theasp commented 7 years ago

Yes, it looks like you are correct.

cljs.user=> (require '[cognitect.transit :as t])
nil
cljs.user=> (def writer (t/writer :json))
#'cljs.user/writer
cljs.user=> (t/write writer [:msg {:a 1 :b 2}])
"[\"~:msg\",[\"^ \",\"~:a\",1,\"~:b\",2]]"
cljs.user=> (t/write writer [:msg {:a 1 :b 2}])
"[\"~:msg\",[\"^ \",\"~:a\",1,\"~:b\",2]]"
ptaoussanis commented 7 years ago

@theasp Thanks for the confirmation :-)

BTW I'll note that there could (conceivably) be other instances where serialized output is not identical between calls with the same input, but that would still be okay so long as the deserialization doesn't depend on any out-of-band state.

If freeze(x)->y, then we only need thaw(y)->x; we don't otherwise care about the interim y value in Sente's case (or for read/write caching).

ptaoussanis commented 1 year ago

Closing in favour of https://github.com/ptaoussanis/sente/issues/421