uwescience / myria

Myria is a scalable Analytics-as-a-Service platform based on relational algebra.
myria.cs.washington.edu
Other
112 stars 46 forks source link

Run-length-encoded column #725

Open dhalperi opened 9 years ago

senderista commented 9 years ago

Description?

dhalperi commented 9 years ago

You can ask @mbalazin to fill it in? Or maybe just look up what run-length encoding is...

senderista commented 9 years ago

I think we all know what RLE is, just thought you might have specific impl ideas...

dhalperi commented 9 years ago

Nope, nothing. Was just logging Magda's request :)

domoritz commented 9 years ago

While on disk or in memory? I feel RLE in the batches may not be too difficult and could give us a lot of benefit. However, it would still be transparent and the algorithms won't know about it and cannot benefit from it. So we would probably lose the compression after we do any operation unless we a) rewrite the operators b) recompress (which is expensive and doesn't give us much benefit in terms of memory savings).

Maybe I'm missing something but I feel this is something big and only doing a bit will not give us any benefit.

mbalazin commented 9 years ago

Column store research has indeed shown that, in order to benefit from compression, the operators must process directly the compressed data.

Magdalena Balazinska Jean Loup Baer Professor of Computer Science and Engineering Associate Professor, Dept. of Computer Science & Engineering Director of the IGERT PhD Program in Big Data / Data Science Senior Data Science Fellow of the eScience Institute University of Washington

On Tue, Mar 31, 2015 at 8:08 PM, Dominik Moritz notifications@github.com wrote:

While on disk or in memory? I feel RLE in the batches may not be too difficult and could give us a lot of benefit. However, it would still be transparent and the algorithms won't know about it and cannot benefit from it. So we would probably lose the compression after we do any operation unless we a) rewrite the operators b) recompress (which is expensive and doesn't give us much benefit in terms of memory savings).

Maybe I'm missing something but I feel this is something big and only doing a bit will not give us any benefit.

— Reply to this email directly or view it on GitHub https://github.com/uwescience/myria/issues/725#issuecomment-88324658.

stechu commented 9 years ago

Implementing RLE in Myria is indeed very exciting. Many graph queries will looks more like in memory graph traversal thus could be potentially much faster. Now we even have in memory adjacency list (sort of)!

Also we could exploit RLE a little bit more by changing the current implementations of operators (aggregation, LeapFrog etc).

On Tue, Mar 31, 2015 at 10:10 PM, Magdalena Balazinska < notifications@github.com> wrote:

Column store research has indeed shown that, in order to benefit from compression, the operators must process directly the compressed data.

Magdalena Balazinska Jean Loup Baer Professor of Computer Science and Engineering Associate Professor, Dept. of Computer Science & Engineering Director of the IGERT PhD Program in Big Data / Data Science Senior Data Science Fellow of the eScience Institute University of Washington

On Tue, Mar 31, 2015 at 8:08 PM, Dominik Moritz notifications@github.com wrote:

While on disk or in memory? I feel RLE in the batches may not be too difficult and could give us a lot of benefit. However, it would still be transparent and the algorithms won't know about it and cannot benefit from it. So we would probably lose the compression after we do any operation unless we a) rewrite the operators b) recompress (which is expensive and doesn't give us much benefit in terms of memory savings).

Maybe I'm missing something but I feel this is something big and only doing a bit will not give us any benefit.

— Reply to this email directly or view it on GitHub https://github.com/uwescience/myria/issues/725#issuecomment-88324658.

— Reply to this email directly or view it on GitHub https://github.com/uwescience/myria/issues/725#issuecomment-88351975.

jingjingwang commented 9 years ago

Related to https://github.com/uwescience/myria/issues/754. More generally, we may want to let operators work in a columnar style. Maybe adopting RLE after that makes sense.