tinygraph / tinygraphio

Tiny graph data interchange file format
https://tinygraph.org
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

Come up with a strategy for graph, node, edge properties #4

Open daniel-j-h opened 1 year ago

daniel-j-h commented 1 year ago

At the moment we compactly store a compressed sparse row graph - we do not store any global graph, or node, or edge properties with it. The thinking was we want to get an MVP out asap and we don't know yet how an interface for these properties should look like and if we should even store properties in the graph format at all.

Use cases for properties include e.g. graph embeddings, node embeddings, edge embeddings, where we need to store fixed size tensors per graph, node, or edge, respectively.

Two tasks here

daniel-j-h commented 1 year ago

If you scroll down, here's a good starting point for tensors in protobuf

https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html

copying the example for float32 tensors

 // A sparse or dense rank-R tensor that stores data as doubles (float64).
 message Float32Tensor   {
     // Each value in the vector. If keys is empty, this is treated as a
     // dense vector.
     repeated float values = 1 [packed = true];

     // If key is not empty, the vector is treated as sparse, with
     // each key specifying the location of the value in the sparse vector.
     repeated uint64 keys = 2 [packed = true];

     // An optional shape that allows the vector to represent a matrix.
     // For example, if shape = [ 10, 20 ], floor(keys[i] / 20) gives the row,
     // and keys[i] % 20 gives the column.
     // This also supports n-dimensonal tensors.
     // Note: If the tensor is sparse, you must specify this value.
     repeated uint64 shape = 3 [packed = true];
 }
daniel-j-h commented 1 year ago

Note that we might need boolean properties e.g. as in

and for that we should think about storing a dense bitset (bytes) with n bits for n nodes/edges.