Come up with a strategy for graph, node, edge properties

daniel-j-h commented 1 year ago

At the moment we compactly store a compressed sparse row graph - we do not store any global graph, or node, or edge properties with it. The thinking was we want to get an MVP out asap and we don't know yet how an interface for these properties should look like and if we should even store properties in the graph format at all.

Use cases for properties include e.g. graph embeddings, node embeddings, edge embeddings, where we need to store fixed size tensors per graph, node, or edge, respectively.

Two tasks here

[ ] decide if we should store properties with the graph, and which ones (e.g. only int/float tensors?)
[ ] come up with an interface for it and how we store these properties

daniel-j-h commented 1 year ago

If you scroll down, here's a good starting point for tensors in protobuf

https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html

copying the example for float32 tensors

 // A sparse or dense rank-R tensor that stores data as doubles (float64).
 message Float32Tensor   {
     // Each value in the vector. If keys is empty, this is treated as a
     // dense vector.
     repeated float values = 1 [packed = true];

     // If key is not empty, the vector is treated as sparse, with
     // each key specifying the location of the value in the sparse vector.
     repeated uint64 keys = 2 [packed = true];

     // An optional shape that allows the vector to represent a matrix.
     // For example, if shape = [ 10, 20 ], floor(keys[i] / 20) gives the row,
     // and keys[i] % 20 gives the column.
     // This also supports n-dimensonal tensors.
     // Note: If the tensor is sparse, you must specify this value.
     repeated uint64 shape = 3 [packed = true];
 }

daniel-j-h commented 1 year ago

Note that we might need boolean properties e.g. as in

here are all forward/reverse edges
here are all bidirectional edges
here are all train/validate nodes/edges

and for that we should think about storing a dense bitset (bytes) with n bits for n nodes/edges.

tinygraph / tinygraphio

Come up with a strategy for graph, node, edge properties #4