higra / Higra

Hierarchical Graph Analysis
Other
99 stars 19 forks source link

Improve graph io #182

Open fguiotte opened 4 years ago

fguiotte commented 4 years ago

Pink graph io is time consuming compared to tree io (for the same image) :

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   25.021   25.021   25.021   25.021 {built-in method higra.higram._read_graph_pink}
        1    1.480    1.480    1.480    1.480 {built-in method higra.higram._read_tree}

That would be nice to offer a binary save alternative.

PerretB commented 4 years ago

Yes, pinkio is convenient (and easy to implement) but clearly very inefficient, both from a memory and time point of view: it was written essentially to offer a backward compatibility with the internal tools we were using before. I don't have any knowledge about standard binary graph format (if any, the one that I found where all text based) and any suggestion is welcome.

As a homemade solution, I think that we can just store the source and target arrays obtained from UndirectedGraph.edge_list and reconstruct it with UndirectedGraph.add_edges. Then, in Python, I would use a hdf5 file (with H5py) to store the arrays. The hdf5 format is very efficient and it would be easy to also store vertex/edges attributes in the same file.

PerretB commented 2 years ago

Note that graphs are now picklable which should be quite efficient but is not a long term storage solution