hyperbard / hyperbard

All the World's a (Hyper)Graph: A Data Drama (DSH 2023)
https://hyperbard.net
BSD 3-Clause "New" or "Revised" License
15 stars 1 forks source link

Provide (hyper-)graphs in a standard format (GraphML) #8

Open xamde opened 1 year ago

xamde commented 1 year ago

The documentation of the results in /graphdata is pretty short. It seems .nodes.csv contains only the nodes with the node ID being the first CSV column. Also .edges.csv are the edges. However, for these the edge endpoints vs. edge properties are not easy to automatically convert from the CSV file.

Furthermore, what do the parts mw, hg, se, ce, w mean?

For the casual researcher, just interested in benchmarking a graph tool, mass ingestion of the graph data is the goal. I'm not a python guy, but this seems to be a standard way to write GraphML in python: https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.graphml.write_graphml.html#networkx.readwrite.graphml.write_graphml

dataspider commented 1 year ago

Thank you for your interest in our data! It seems that the answers you are looking for are documented on pages 25-28 (“B.3 graphdata”) of this document: https://arxiv.org/abs/2206.08225

TL;DR “what do the parts mean”:

The documentation linked above tells you what the node, edge, and attribute columns are in each of the files. We deliberately opted for CSV over other formats as good parsers are available in all programming languages.

In case you are still confused about how to load the (hyper)graphs, you might want to check out https://github.com/hyperbard/hyperbard/blob/main/src/hyperbard/graph_io.py (note that the loaders there restrict to named characters by default – which can make a big difference depending on what you want to do).

Hopefully, the comments above provided some clarity. Let us know if you have further questions!