Graph Matching and LZW Compression

whock commented 9 years ago

A similar idea was brought up in a separate thread, but I was wondering what ways exist to compress the information present in a graph G and then use this compressed Gc to make certain tasks easier, e.g. graph matching (easier because there are fewer nodes and, depending on the algorithm, maybe the nodes left correspond to more "important" pieces of the graph, i.e. not spurious nodes that aren't found in both graphs?).

One method I found is Lempel-Ziv-Welch (LZW) compression [1] which uses a dictionary of bits representing pieces of input to encode and compress the input. It seems like the key advantage is when the input has repeated patterns: the dictionary only grows (once initialized) when un-encountered strings are encountered for the first time so as the input signal gets longer and longer (inevitably with repeating patterns if the input is some natural language) then the compression asymptotically approaches the maximum possible compression [2].

There have been attempts to apply this as a lossless compression to graphs [3] and so I have a couple questions:

1) Does this work? Is this a good way to compress graphs and if not are there any good ways of compressing graphs? 2a) If this does work, what are the limitations? What graphs would this work / not work on? 2b) and could we use this to make graph matching easier? As I alluded to above, with fewer nodes my intuition is that the matching problem gets easier.

[1] Welch, Terry (1984). "A Technique for High-Performance Data Compression" (PDF). Computer 17 (6): 8–19. doi:10.1109/MC.1984.1659158

[2] Ziv, J.; Lempel, A. (1978). "Compression of individual sequences via variable-rate coding" (PDF). IEEE Transactions on Information Theory 24 (5): 530. doi:10.1109/TIT.1978.1055934

[3] Efficient Lossless Compression of Trees and Graphs, Chen and Reif, Duke University Dept of Computer Science. https://www.cs.duke.edu/~reif/paper/chen/graph/graph.pdf

akim1 commented 9 years ago

This is mere speculation, but graphs should be able to be compressed. Images can be compressed, and graphs can be represented as images (e.g., using pixel intensity as a value of adjacency matrices)?

If there are patterns in the graphs then that should also be able to be compressed. Given that compression and decompression just adds to the computational process, unless it is intelligently compressed in such a way it preserves the graph feature of interest, my guess is that compressing graphs is probably not ideal for processing, since it may hide information you might potentially want.

mblohr commented 9 years ago

The Signal Sub-Graph paper (Vogelstein) is one approach to this.

open-connectome-classes / StatConn-Spring-2015-Info

Graph Matching and LZW Compression #219