Compression of data transfer between server and client

vellamike commented 11 years ago

During the resolution of https://github.com/openworm/org.geppetto/issues/8 @tarelli @gidili and I realised that large SPH scenes result in a communications bottleneck. We decided that simulation information sent from server to client needs to be compressed in some way. I'll start the ball rolling with some ideas of how to do this:

Binary not JSON
Data compression
Select "rendering particles" at configuration stage - only these are sent, the others are used in calculation only.
Work out 3D Convex hulls of particles and send only those
Only send particles which have moved by a significant distance. The server will need to keep track of what has been displayed on the client.

vellamike commented 11 years ago

Could we have an example of one of the 2.2MB files which contain a frame in the big PCISPH scene? I'd like to see where different compression techniques get us.

tarelli commented 11 years ago

Here it is: https://gist.github.com/tarelli/6229869/raw/aa9f0d9089021a0ec4160ee36cdb18ac3115f5af/gistfile1.txt

vellamike commented 11 years ago

gzip compresses the 2.2MB file to 122KB -6% compression ratio.

msasinski commented 11 years ago

Few questions: 1 . Do we really need to transfer numbers with such a precision?

We can change JSON to display abbreviated names i.e t for type, p for position Using abbr. names alone saves about 500kb on an uncompressed file

But the above, plus compression is nothing compared to what we can save using binary.

vellamike commented 11 years ago

You read my mind, I was literally thinking that this second, most of the unique data is stored there.
I doubt this would make a difference with compression.
Really? Why do you think Binary will save so much compared to compression?. I estimate that by reducing the precision to 2 significant figures and using LZW compression we can get to about 1% compression.

On 14 August 2013 14:30, Mariusz Sasinski notifications@github.com wrote:

Few questions: 1 . Do we really need to transfer numbers with such a precision?

We can change JSON to display abbreviated names i.e t for type, p for position Using abbr. names alone saves about 500kb on an uncompressed file

But the above, plus compression is nothing compared to what we can save using binary.

— Reply to this email directly or view it on GitHubhttps://github.com/openworm/org.geppetto/issues/9#issuecomment-22635260 .

tarelli commented 11 years ago

Precision is definitely one of the low hanging fruits we can grab. For visualization purposes the current precision is not necessary.
Shouldn't make a difference once it's compressed.
Binary might save us in terms of performance for the unpacking I think but I would be surprised if it would give us completely different compression rates.

msasinski commented 11 years ago

While I agree that #2 will not make a huge difference, on a larger sets even 1% difference will be significant. In addition to that, please remember that we're dealing with strings here, and ever char removed from the string will help to save memory and processing power. sending 500kb less to gzip/lzw to compress may help quite a bit if we're trying to generate->compress->send->uncompress->process every 20ms or so.

As far as binary, I'm sure that it would help with memory, cpu, and transfer, and it also can be compressed (even 1% may help)

gidili commented 11 years ago

Nice MIke - I would've guessed compression would shrink the file around 60% but looks like I was dead wrong!

If we remove precision digits steady state fluctuations are going to disappear from the visualization, and the scene will look still while the simulation is still going on. Not that it matters, but if we can avoid it would be better in my opinion.

Given that compression will bring size down that much, I vote for implementing compression for now and as we work with bigger scenes see what other improvements are needed. Before it comes to removing precision digits I would like to see how binary representation actually fares vs compression.

vellamike commented 11 years ago

Giovanni - I see your point about the scene appearing static, but I'm sure if we go to 3sf rather than the current ~17 we will still see the same effect?

On 14 August 2013 14:51, Giovanni Idili notifications@github.com wrote:

Nice MIke - I would've guessed compression would shrink the file around 60% but looks like I was dead wrong!

If we remove precision digits steady state fluctuations are going to disappear from the visualization, and the scene will look still while the simulation is still going on. Not that it matters, but if we can avoid it would be better in my opinion.

Given that compression will bring size down that much, I vote for implementing compression for now and as we work with bigger scenes see what other improvements are needed. Before it comes to removing precision digits I would like to see how binary representation actually fares vs compression.

— Reply to this email directly or view it on GitHubhttps://github.com/openworm/org.geppetto/issues/9#issuecomment-22636638 .

msasinski commented 11 years ago

@gidili are you sure that using float with such a great precision will give us the right results anyway?

tarelli commented 11 years ago

We need to measure how many digits are needed for fluctuations to not disappear. I would be surprised if we need more than 6.

gidili commented 11 years ago

@vellamike @msasinski @tarelli you guys are right - I didn't look at the file, I wasn't expecting 15 decimal digits. In my experience working with those numbers 7 digits should be more than enough.

vellamike commented 11 years ago

Some more information on compression: http://buildnewgames.com/optimizing-websockets-bandwidth/

gidili commented 11 years ago

Some benchmarks on JSON compression: http://web-resource-optimization.blogspot.ie/2011/06/json-compression-algorithms.html

tarelli commented 10 years ago

By leaving only two decimal digits (during visualisation, not computation, fluctuations still happen) and shortening some names I managed to go from 2.2MB to 1.3MB for the big scene. Not enough but it's a start. I will be now looking at compression but we might have to wait on some further development related to Websocket compression, see this.

vellamike commented 10 years ago

Does it look the same with only two decimal digits in visualisation?

tarelli commented 10 years ago

Pretty much yes, I tried also with just one but then quantisation starts being visible.

vellamike commented 10 years ago

That's great. If you have a moment could you try and see what it looks like with fewer particles in visualisation? e.g drop every 2nd particle. We might be able to get by with very few particles (surface particles for instance).

That link seems encouraging.

tarelli commented 10 years ago

You can still figure out what the shape is (the higher the density of points the less it becomes noticeable) but in general I'm not a big fan of removing points "randomly". Also we can't get too far with it, sampling one point every three still gives circa 400KB with that big scene (which isn't really that big given future perspective) per each update and compression would still be needed. On the other hand calculating which ones are on the surface would be interesting to explore and I just sent an email which also covers that, do you know of any algorithm we could use in realtime to do this?

vellamike commented 10 years ago

It appears there are a number of ways to compute a 3D convex hullhttp://stackoverflow.com/questions/18416861/how-to-find-convex-hull-in-a-3-dimensional-space. However, I valuable information is then being lost. (Although to be fair it jus depends on exactly what you want to do).

I answered your emailhttps://groups.google.com/forum/#!topic/openworm-discuss/9GXYssTzyBw too so lets continue the discussion there.

On 11 December 2013 13:58, Matteo Cantarelli notifications@github.comwrote:

You can still figure out what the shape is (the higher the density of points the less it becomes noticeable) but in general I'm not a big fan of removing points "randomly". Also we can't get too far with it, sampling one point every three still gives circa 400KB with that big scene (which isn't be really that big given future perspective) per each update and compression would still be needed. On the other hand calculating which ones are on the surface would be interesting to explore and I just sent an emailhttps://groups.google.com/forum/#!topic/openworm-discuss/9GXYssTzyBwwhich also covers that, do you know of any algorithm we could use in realtime to do this?

— Reply to this email directly or view it on GitHubhttps://github.com/openworm/org.geppetto/issues/9#issuecomment-30321897 .

msasinski commented 10 years ago

Does Geppetto currently bundles the data into bigger chunks before sending it? Is gzip being used? If the answer is no to both questions, how hard would it be to bundle the data and send it every 500ms, and enable gzip?

tarelli commented 10 years ago

I'm still figuring out about "Enabling gzip". The proper way to do this is to have the server compressing and the browser decompressing based on a flag which is handshaked at the start. But we use WebSockets and it looks like the support for this is not available yet, see this. The alternative which stays is to manually compress the message inside the web socket servlet and then use javascript to decompress it but I don't even know whether this is a good idea attempting since I'm afraid performances of write/reading data would compensate the benefit of sending smaller updates. Any previous experience with this?

borismarin commented 10 years ago

I have some experience with compression for data acquisition/streaming over ethernet, and LZO seemed to offer the best throughput (though compression rates are smaller than lz77/gzip). We've also experimented with static Huffman, but evidently it is only decent for data with similar statistics (we had two trees, one for control and other for data streams, and that was the best cost/benefit solution). btw, https://github.com/ning/jvm-compressor-benchmark/wiki edit: LZ4 seems very promising as well. It is kind of new, haven't experimented with it yet...

vellamike commented 10 years ago

I've been thinking some more about this issue. Imagine we had one million particles (quite realistic for a worm swimming in deep water) and we wanted to stream at 20fps:

particles = 1e6
fps = 20
bytes_per_particle = 24
bytes_per_second = particles*bytes_per_particle*fps
=> B/s = 480000000.0 = ~0.5GB/s

I think transmitting this is hugely challenging, even if we did achieve the necessary 1% compression ratio (5MB/s), I can't imagine how the client machine is going to be decompressing such a huge quantity of data.

msasinski commented 10 years ago

@vellamike There is some additional overhead for TCP transport, and websocket packet. Also, in calculating bytes_per_particle have you considered that websockets use UTF-8 for transport? If we consider network collisions, latency and other stuff that will affect throughput then our situation looks very bleak :(

vellamike commented 10 years ago

This is an "optimistic" calculation so I haven't looked at second order effects like that, but I agree that transmitting all the SPH data as it is generated will not be possible with today's technology. On 21 Dec 2013 12:01, "Mariusz Sasinski" notifications@github.com wrote:

@vellamike https://github.com/vellamike There is some additional overhead for TCP transport, and websocket packet. Also, in calculating bytes_per_particle have you considered that websockets use UTF-8 for transport? If we consider network collisions, latency and other stuff that will affect throughput then our situation looks very bleak :(

— Reply to this email directly or view it on GitHubhttps://github.com/openworm/org.geppetto/issues/9#issuecomment-31062026 .

gidili commented 10 years ago

@vellamike thanks for raising this. As you have in the issue description and as we discussed before, one of the options to reduce the amount of data sent is to map particle set to 3D geometries and only send vertices. This would reduce the amount of data sent enormously (let's say a muscle cell is in the 100s of particles, to represent it we'd only send a dozen vertices or so). Sending particle by particle was never meant to be a long term solution. As we work with bigger and bigger scenes we are going to have to move away from sending all the particle coordinates, one way or the other.

vellamike commented 10 years ago

@gidili Agreed, but this still has issues:

It only solves a subset of the problem - visualisation of the fluid surface. If we need the particles themselves for calculation of things like viscosity,density,turbulence etc. I assume this will have to be done server-side.
Calculation of 3D geometries using convex hulls is computationally intensive, in general for d dimensions and N particles convex hull algorithms scale as O(N^[(d/2)+1]) ref so for our case the scaling will be O(N^(5/2)). It could easily turn out that computation of the convex hull is more intensive than the SPH algorithm itself.

gidili commented 10 years ago

@vellamike 1) yes you have the particles server-side and no need to send them / stream them anywhere to calculate that stuff. 2) There are many ways of "cheating" in terms of visualisation, we don't necessarily need the visual representation of what's going on to be point by point accurate (for example the fluid) as long as the simulation itself is.

vellamike commented 10 years ago

How would 1) work for someone who wanted to do mathematical analysis of the fluid flows? Would they need to write their own OSGI bundles or is there another way?

gidili commented 10 years ago

Well for starters I'd imagine that if you wanna just do fluids you don't need the worm or any of the other stuff, but this may be true or not. Regardless, for the direction things are going, you'd just select which particles you want to "watch" (a population of hundreds, thousands or millions) and only those values (position, velocity, etc.) will be sent to the client. There's gonna be a given amount of data that could be defined as "too much data", in those extreme scenarios it could be that you can only do that kind of stuff on a local network (or even localhost). We're gonna have to play the balancing game there and I am sure there's tons of other issues we are not considering, but it's good to be thinking about these problems now.

msasinski commented 10 years ago

One additional hurdle to overcome - when the PCISPH and other currently under performing parts of the system are optimize we will have to deal with larger amounts of data and have less time to do it. So if we improve Sibernetic code to work @24FPS from the current 6@FPS, instead of having to deal with 2MB every 20 ms we will have to deal with 8MB.

msasinski commented 10 years ago

@gidili @vellamike Our situation may not be as bleak as I've previously thought. We can improve our situation by implementing something similar to a proxy server. If we can have Geppetto and Sibernetic produce the data and send it to a tiny proxy server using UNIX Sockets or UDP, we should be able to handle larger amounts of data.

The proxy would take the data, store it in a small local buffer, do simple processing (compress, decimate) and send the dataset to a processing server.

Mini-proxy would not require large amounts of memory (even 100 steps would be just ~ 200MB), nor processing power so it would only slightly affect Geppetto/Sibernetic.

Processing server, on the other hand, would have all the time in the world to process the data further, save it or serve it. It could have support for TCP only, WebSockets, smoke signals etc. It could convert the data to convex, save to a database, HDF5 file, or act as websocket server. It could support multiple clients viewing same data.

This way we would decouple the code even further, and allow both Geppetto and Sibernetic to take advantage of the same functionality (processing, storing, serving).

*With this amount of data produced/processed nothing is simple.

vellamike commented 10 years ago

@msasinski would UNIX Sockets or UDP be able to handle 0.5GB/s? That's my conservative estimate for a scene we are likely to want to utilize.

msasinski commented 10 years ago

@vellamike I don't know, but it should be easy to find out. So let's do it:

Let's just test Unix Sockets.

in one xterm do this : nc -lU /tmp/owSocket > /dev/null

and in the other do this: sudo pv /dev/zero | nc -U /tmp/owSocket

My system results: 11.9GB 0:00:13 [ 985MB/s] - that's about 8Gb/s .. It looks like it's possible :) However it does use a lot of CPU, but at this point I think that's the only solution we have. Few of these 64 cores may have to be used for preprocessing :)

Another test with gzip sudo pv /dev/zero | gzip -c -1 | nc -U /tmp/socketTest clocks at 210MB/s

This is much worse than I expected, but it's just another hoop to jump through and not something impossible to overcome.

To summarize - if we can get GPU to do most of the heavy lifting, and use CPU to preprocess the data, it's possible that with few CPU cores and 1GBit connection we can deal with current limitations of our system, and make scalable. Of course it's not going to be easy and it may take a while but it should be possible.

msasinski commented 10 years ago

Now let's do some TCP We will use iperf for that

start iperf server

 iperf -s

and let's check our bandwidth (TCP) in another window run

 iperf -c localhost

On my system with TCP window size: 85.3 KByte I got 59.5 GBytes 51.1 Gbits/sec

Problem solved?

vellamike commented 10 years ago

Those numbers look pretty encouraging @msasinski ! Very impressive analysis.

One question - is /dev/zero a realistic way to test gzip? Perhaps I'm missing something, but isn't that just asking gzip to compress null characters, which should be computationally less intensive?

msasinski commented 10 years ago

@vellamike No, unfortunately these number are not realistic, but I was just checking if we can even dream about using this solution. There will be a lot of different things that will impact the final numbers, and at this point it's impossible to account for all of them. It's not going to be easy, but from my perspective we don't have any other choice ,but to try to make it work.

charles-cooper commented 10 years ago

http://www.bitmover.com/lmbench/

There are also multithreaded compression algorithms, check out pigz.

tarelli commented 10 years ago

@richstoner can BinaryJS be used without a nodejs server? We use websocket streamed from Java so we would need Java libraries server side and pure Javascript libraries client side.

Neurophile commented 10 years ago

Maybe I'm missing something, but why not render on the server side and use one of the many stock methods for sending compressed image data?

If you are wanting to stream anything over the internet, 5 mb/s (bits, not bytes) is going to be a realistic upper limit for all but the very highest tiers of internet service.

tarelli commented 10 years ago

If you render server side you are removing the possibility to interact with the scene (beside camera controls in the future it will also be possible to select different entities and explore components at different scales).

The current progress of this is that we have two branches, one for org.geppetto.core and one for org.geppetto.frontend. We are currently experimenting with LZ4. The problem with this now is that the Java library we are using to compress the data server side doesn't yet support the latest LZ4 stream specification, see this bug I opened and this other related. The library is open source and on GitHub, we could fork it and update it ourselves if anybody had the bandwidth to do it.

mlolson commented 10 years ago

I came across the "Snappy" algorithm yesterday, and I'm wondering if it might work for this. Apparently it is fast, although not quite as fast as lz4. Here are implementations in java and node.js:

https://github.com/xerial/snappy-java https://github.com/kesla/node-snappy

To get the node module to work on the browser side I've been trying to use browserify to create a standalone module, but so far this isn't quite working.

tarelli commented 10 years ago

Interesting! Keep me posted :)

perqa commented 9 years ago

Have you tried UDT?

From the home page @ http://udt.sourceforge.net/index.html:

"UDT is a reliable UDP based application level data transport protocol for distributed data intensive applications over wide area high-speed networks. UDT uses UDP to transfer bulk data with its own reliability control and congestion control mechanisms. The new protocol can transfer data at a much higher speed than TCP does. UDT is also a highly configurable framework that can accommodate various congestion control algorithms. (Presentation: PPT 450KB / Poster: PDF 435KB )"

domenic commented 9 years ago

As a first pass, I would suggest using gzip on the server side before sending the data, then unzipping on the client side with https://github.com/imaya/zlib.js (possibly in a web worker to avoid janking the user experience while it decompresses; remember your frame budget is ~16 ms to make the app feel responsive to user input).

It'd be interesting to see how much gain changing the file format to a binary one (possibly based on protocol buffers?) would gain in size and in decompression time, but the maintenance overhead would be an order of magnitude higher than just compression.

tarelli commented 9 years ago

@domenic do you still think you would have some time to help with this? :) the next Geppetto meeting will be on the 20th of January at 4pm GMT

frenkield commented 9 years ago

I did some experimentation with a compressed binary protocol. I also moved message compression and transmission into separate threads. On an EC2 GPU instance (g2.2xlarge) I'm getting about 8 fps in the browser for sphModel_Elastic.xml. This model has 16974 liquid particles and 1575 elastic particles.

Model: sphModel_Elastic.xml Liquid particles: 16974 Elastic particles: 1575 Browser fps: 8 Binary scene size (doubles): 600kb Binary compressed scene size: 260kb Binary compression time: 55ms SPH computation step: 26ms

For comparison: JSON scene size: 5mb JSON compressed scene size: 700kb

The binary representation is just 4 doubles per particle. One double for particle id and type, and 3 doubles for position. I haven't tried this with floats yet but I assume the compressed size would be around 200kb.

The g2.2xlarge is doing the whole SPH computation step in 26ms but the transport thread is only sending at about 8 fps. This is partly due to the 55ms required for message compression. But it seems like there are some other bottlenecks somewhere. Maybe I can get this up to 10 or 12 fps with a bit of refactoring.

The latest version of Virgo (3.6.3) supports the JSR-356 Java WebSocket 1.0 implementation. This has per-message deflate so that would eliminate the need for explicit compression. Chrome supports per-message deflate and maybe Firefox does as well but I'm not quite sure. So upgrading to Virgo 3.6.3 and refactoring GeppettoServlet to use JSR-356 might provide a good performance boost for the current implementation without any other code changes. I'm planning to give this a try next.

gidili commented 9 years ago

@frenkield that sounds quite promising, for comparison how many fps do you get in browser for the small elastic and small liquid scenes (default Geppetto samples) with the same setup?

Also I think that some of the contributors are on Virgo 3.6.3 already so upgrading shouldn't be a problem.

Thanks for looking into this, sorting it would be a major improvement!

frenkield commented 9 years ago

@gidili, I'm seeing up to 200 fps (scene frames processed per second really) with the small elastic scene. The small liquid scene is broken at the moment. I'll take a look later tonight and get that resolved.

I can leave it running for a few hours tomorrow if you want to take a look.

openworm / org.geppetto

Compression of data transfer between server and client #9