taoensso / sente

Realtime web comms library for Clojure/Script
https://www.taoensso.com/sente
Eclipse Public License 1.0
1.73k stars 193 forks source link

Compression for WebSockets payload #72

Closed matthiasn closed 9 years ago

matthiasn commented 9 years ago

First of all, thank you for writing Sente. I have found it easy to learn and it works really nicely in the new Clojure version of my BirdWatch project. There is one question I have though. In the previous version of my application which made use of Server-Sent Events, I was utilizing nginx for gzip-compressing larger chunks of previous data on-the-fly. I would like to do the same when delivering comparable chunks of data over the WebSockets connection. Are there any plans to support per-message compression as described in this article? Thanks, Matthias

ptaoussanis commented 9 years ago

Hey Matthias, you're very welcome!

Are there any plans to support per-message compression as described in this article?

Not specifically, but efficiency is a major priority for Sente. I'd be open to any+all ideas, but I'm not familiar enough with current WebSocket compression methods to comment off-hand (would need to dig into it a little).

One observation: It may be possible to get better browser support and better control by implementing our own payload compression scheme on top of standard, uncompressed WebSockets.

As of Sente v1.0.0, there's also an obvious+easy extension point for something like this: https://github.com/ptaoussanis/sente/blob/master/src/taoensso/sente/interfaces.cljx#L13

It'd be possible, for example, to write an adaptive compressing packer that examines payload contents and dynamically chooses a compression scheme based on some quick cost/benefit analysis. There's something similar happening already here: https://github.com/ptaoussanis/sente/blob/master/src/taoensso/sente/packers/transit.cljx#L53.

I'll be a bit tied up over the next few weeks, so have no immediate plans for looking into any of this myself - but I'd absolutely welcome any ideas/data/experiments/PRs.

Final observation: unless you're sending very large payloads, you may find that de/compression time ends up dominating any benefits. Clojure's edn format is notoriously slow for example, but all my experiments have consistently shown that switching to a more efficient format ends up hurting more than helping in most cases since the common pattern is frequent+small payloads over fewer+bigger payloads.

This is the kind of thing we could optimise around with per-message controls though (like we do now with Transit encoding).

Cheers! :-)

matthiasn commented 9 years ago

Hi Peter, thanks a lot for the quick and very detailed response, much appreciated. I think in my use case the best way would probably be to avoid larger payloads (700-900KB each) altogether by performing the aggregations on the server side and then only transmit aggregates with a fraction of the size. But even at a tenth of the size compression might still help, I will try to look into that further.

I see that you added transit for packing data and I think that's great. However is there a reason you use :edn in the new example? When I use :json in my application instead, the chunks of previous tweets load much faster compared to :edn, like down from 40 seconds to 24 seconds for 30 chunks of 500 tweets each. For my understanding: :json IS transit?

Cheers, Matthias

ptaoussanis commented 9 years ago

Hi Matthias, no problem :-)

I think in my use case the best way would probably be to avoid larger payloads (700-900KB each) altogether by performing the aggregations on the server side and then only transmit aggregates with a fraction of the size.

Just to clarify: there's nothing wrong with sending large payloads, it's just not typical; most events are less than 1kb in my experience, with a couple larger server>client ones now and again.

However is there a reason you use :edn in the new example?

The packer in the example actually allows a per-payload upgrade to json by prepending any value with ^:json. For example:

(chsk-send! <uid> {:foo :bar})             ; Will send via edn (more efficient since payload is small)
(chsk-send! <uid> ^:json <big-map>) ; Will send via Transit/json (more efficient if payload is large)

:edn is selected as the default since it seems to currently be more efficient in practice for payloads that encode in < ~200 chars. There's some benchmarking stuff in transit.cljx if you're curious to experiment yourself.

the chunks of previous tweets load much faster compared to :edn, like down from 40 seconds to 24 seconds for 30 chunks of 500 tweets each.

That'll depend entirely on the size of the data you're sending. Large payloads (>~200 encoded chars) will benefit from Transit/json encoding, otherwise edn will probably be both leaner + faster. The bigger the data, the more the Transit/json encoding will benefit.

If you use the flexi packer (as in the ref example) you can prepend your large payloads with ^:json to get the more efficient packing there.

This way you can select the most efficient encoding on a per-payload basis. Does that make sense?

For my understanding: :json IS transit?

Transit's a format for ~losslessly encoding Clojure (edn) data to and from other formats (like JSON and MessagePack).

Given some Clojure (edn) data, we could encode it:

[1] Speed and leanness benefits start kicking in only for larger values. [2] Speed and leanness benefits start kicking in only for yet-larger values.

Does that make sense?

matthiasn commented 9 years ago

Hi Peter, very helpful, thanks a lot :-) Makes perfect sense to me. I will experiment with these options. Many of the messages are right around that size (200 Bytes) or much larger, so I probably won't gain a measurable advantage by using edn, but I could still try it.

One additional question if I may take up more of your time: I spent the weekend rewriting my application using the component library and it makes the whole design a lot cleaner, with zero dependencies between namespaces except for the main namespace which wires components together. That makes it a lot easier to reason about the architecture as a whole.

Now components implement the LifeCycle protocol, part of which is stop for tearing down resources. I wonder how I could correctly do that for what sente/make-channel-socket! created.

Here's the source for the specific component: communicator.clj

Thanks a lot in advance! Right now I just restart the JVM but in theory this could be used to start and tear down the entire system from within a REPL session. In that case I'd like to free resources where I can.

Cheers, Matthias

ptaoussanis commented 9 years ago

Now components implement the LifeCycle protocol, part of which is stop for tearing down resources. I wonder how I could correctly do that for what sente/make-channel-socket! created.

Sure. The only thing you might like[1] to do is stop any router loops you've started. The start-chsk-router! constructors return a (fn stop []) that you can call again to have them shutdown.

Other than that, Sente's already written so that chsks are self-contained (no global state). Multiple channel sockets can be opened/closed simultaneously and they won't affect one another.

[1] Since the routers are just (lightweight) go loops, and they're attached to a channel socket's particular channel - there's actually no real harm in just letting unused ones hang around in a parked state.

The following component libs also contain Sente modules so may be worth checking out: By @danielsz: https://github.com/danielsz/system By @hugoduncan: https://github.com/palletops/leaven with https://github.com/palletops/bakery

Hope that helps, cheers! :-)

matthiasn commented 9 years ago

Definitely helps, thanks a bunch! Implemented that now and seems to work nicely.

Also the links are great. I was wondering how to structure an application on the ClojureScript side and leaven might help there.

Are you planning on attending any of the Clojure-related conferences?

ptaoussanis commented 9 years ago

Are you planning on attending any of the Clojure-related conferences?

Would love to, but unfortunately no time short/medium term. Have my hands full with a product launch now, then I've got research/prep for an upcoming project.

Will keep an eye out for EuroClojure next year since it's possible I'll be nearby around then.

matthiasn commented 9 years ago

That would be great, I'd like to buy you a beer then!

ptaoussanis commented 9 years ago

Sounds good, hopefully see you then! :-)