igraph / xdata-igraph

xdata igraph, has been merged into igraph/igraph
GNU General Public License v2.0
18 stars 3 forks source link

GraphML format for storing graphs #19

Closed jovo closed 10 years ago

jovo commented 10 years ago

@gaborcsardi - seems like GraphML does indeed have many of the desired properties (@dmhembe1 - perhaps post the benchmarks for time & space in this comment thread).

however, it seems that the standard GraphML format only allows for certain vertex attribute types, including: boolean, int, long, float, double, or string (see http://graphml.graphdrawing.org/primer/graphml-primer.html#Attributes).

some of our vertex attribute types are vector valued, such as the latent positions. there seem to me to be 2 reasonable options here:

1) define an attribute for each dimension of the latent positions, so each vertex might have 100 latent position attributes, numbered in order of decreasing eigenvalues.

2) extend GraphML using the XML schema to allow for vector valued attributes.

i wonder if you have a different idea or a suggestion for us?

disa-mhembere commented 10 years ago

Benchmarks: Brain-Graph: n=16M, m=45M. Storage mat format: 576 MB Uncompressed, 79 MB Compressed (33sec)

igraph

Write graphml from igraph: 138 sec Read from graphml: 340 sec

Graphml Storage: 4.9 GB Uncompressed, 260MB Compressed (125 sec)

Compression = zip All run on 1-core (32GB Ram, 2.4 Ghz)

On Sun, Jan 12, 2014 at 2:57 PM, joshua vogelstein <notifications@github.com

wrote:

@gaborcsardi https://github.com/gaborcsardi - seems like GraphML does indeed have many of the desired properties (@dmhembe1https://github.com/dmhembe1- perhaps post the benchmarks for time & space in this comment thread).

however, it seems that the standard GraphML format only allows for certain vertex attribute types, including: boolean, int, long, float, double, or string (see http://graphml.graphdrawing.org/primer/graphml-primer.html#Attributes ).

some of our vertex attribute types are vector valued, such as the latent positions. there seem to me to be 2 reasonable options here:

1) define an attribute for each dimension of the latent positions, so each vertex might have 100 latent position attributes, numbered in order of decreasing eigenvalues.

2) extend GraphML using the XML schema to allow for vector valued attributes.

i wonder if you have a different idea or a suggestion for us?

— Reply to this email directly or view it on GitHubhttps://github.com/igraph/xdata-igraph/issues/19 .

jovo commented 10 years ago

thanks @dmhembe1 do you have an opinion on 1 vs 2 vs some 3rd alternative?

gaborcsardi commented 10 years ago

I forgot to say last time, that if you only use a data set from R, then just use save and load instead of a real file format. I guess it is much faster than parsing graphml and it is also compressed.

As for graphml, an alternative is to serialize complex attributes into strings and then de-serialize them after loading. This is not a very good solution, because it is platform-dependent. R serialization is not (well?) supported by Python and vice-versa.

The best solution would be to extend graphml, but that is a bigger piece of work, I mean to write the parser. Plus igraph's attribute handler internally does not support anything but boolean, string, and numeric scalars. So this will not happen very soon.

jovo commented 10 years ago

I forgot to say last time, that if you only use a data set from R, then just use save and load instead of a real file format. I guess it is much faster than parsing graphml and it is also compressed.

good to know.

As for graphml, an alternative is to serialize complex attributes into strings and then de-serialize them after loading. This is not a very good solution, because it is platform-dependent. R serialization is not (well?) supported by Python and vice-versa.

The best solution would be to extend graphml, but that is a bigger piece of work, I mean to write the parser. Plus igraph's attribute handler internally does not support anything but boolean, string, and numeric scalars. So this will not happen very soon.

ok, in that case, what do you recommend we do?

gaborcsardi commented 10 years ago

Well, as long as you are using load/save in R, this is not a problem. If you want a format that is portable to another system, then what seems easiest to me is saving these attributes to a different file. E.g. save a graphml file with all other attributes (this is kind of automatic, because the non-scalar attributes are ignored when saving graphml files, anyway), then writing the complex attribute to another file, and then zip them together if you like. (Or xzip them individually, or whatever you prefer.)

jovo commented 10 years ago

based in part on your suggestion, we have decided on graphml. thanks.