tensorflow / gnn

TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.
Apache License 2.0
1.34k stars 171 forks source link

Faster tfgnn.write_example / serialization #814

Closed juanmirocks closed 1 month ago

juanmirocks commented 3 months ago

Stack:

For relatively large graphs, the operations example = tfgnn.write_example(graph); writer.write(example.SerializeToString()) to serialize a GraphTensor are in my experience very slow. I wonder if there is a faster serialization alternative.

In particular, I'm trying to create a serving model behind tensorflow-serving that deserializes back a string SymbolicTensor to GraphTensor. The provided tfgnn.keras.layers.ParseExample layer works fine. But I'm trying to substitute that by another type of deserialization (and corresponding previous serialization). For instance, for experimentation, I'm trying using pickle.dumps and then pickle.loads (in a serving model sub-classed layer) but so far this didn't work, given that the input is a SymbolicTensor and the method .numpy() is not available. I was also not able to write such a layer that can be serialized into a saved model.

arnoegw commented 2 months ago

TensorFlow Serving is meant to be used with tf.Example inputs. You can try and create alternatives, but be aware that TensorFlow Serving just deals in C++ TensorFlow (no Python code, esp. no pickle), so this would likely involve creating a custom TF op.

The common way of creating tf.Examples representing GraphTensors is to populate these protocol buffer messages directly, without going through TensorFlow; see the Data Preparation and Sampling and Beam Sampler guides.

By contrast, tfgnn.write_example() is more useful to close the loop from tfgnn.GraphTensor to tf.Example and back for demos, tests and other not-so-large data. Notice it's a pure Python utility; TensorFlow does not provide any ops for efficient bulk creation of tf.Example protos. So, notwithstanding your valid feedback, I'd like to argue it's more or less working as designed.

juanmirocks commented 2 months ago

Thank you @arnoegw for your considerations & thoughts.

So far, I ended up discarding other types of serialization, and instead avoiding it altogether. That is, now I just load the tfgnn model in memory, derive the input tfgnn.GraphTensor and pass it to the model for inference directly. So no tfgnn.write_example() and no tf-serving. For my case, that improved speed a lot.

arnoegw commented 1 month ago

Glad to hear! I think that's a fitting solution.

I'd like to close this issue, as there is nothing left to do for TF-GNN.