tensorflow / gnn

TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.
Apache License 2.0
1.3k stars 167 forks source link

Broadcast padding mask to align with labels #788

Closed frytoli closed 4 months ago

frytoli commented 4 months ago

I've been following the Input Pipeline guide, and I'm stuck on padding graph tensors and using the resulting mask to evaluate metrics during training. The "Padding" section of the guide shows how to pad graph tensors, and it includes this example code snippet:

  ...
  graph, mask = tfgnn.keras.layers.PadToTotalSizes(size_constraints)(graph)
  graph, labels = ... # Splitting the label off the *padded* tensor.
  mask = ...  # If necessary, broadcast from context to align with labels.
  ...

My question is with the last line in this snippet -- "broadcasting" the padding mask "from context to align with labels". I'm attempting to predict both node and edge labels and have computed the following size constraints for my dataset:

SizeConstraints(total_num_components=17, total_num_nodes={'segments': 16098}, total_num_edges={'beta-skeleton': 64256})

I also map my dataset like this:

def batch_and_split(graph):
    graph = graph.merge_batch_to_components()  # merged the input batch to components of one contiguously indexed graph
    graph, mask = tfgnn.keras.layers.PadToTotalSizes(size_constraints)(graph)  # batch padding
    # readouts
    readout = tfgnn.keras.layers.Readout(feature_name="label")
    node_labels = readout(graph, node_set_name="segments")
    edge_labels = readout(graph, edge_set_name="beta-skeleton")
    graph = graph.remove_features(
        node_sets={"segments": ["label"]},
        edge_sets={"beta-skeleton": ["label"]},
    )
    # return
    return graph, (node_labels, edge_labels), mask

train_dataset = train_dataset.batch(16) \
    .map(batch_and_split)

And finally I compile my model like this:

... # pretty vanilla model definition/updates and such...
...
model = tf.keras.Model(inputs=[input_graph], outputs=[readout_node, readout_edge])

model.compile(
    tf.keras.optimizers.SGD(learning_rate= 0.0002, momentum=0.9),
    loss = {
        "nodes": tf.keras.losses.BinaryCrossentropy(),
        "edges": tf.keras.losses.BinaryCrossentropy(),
    },
    weighted_metrics = {
        "nodes": tf.keras.metrics.BinaryCrossentropy(),
        "edges": tf.keras.metrics.BinaryCrossentropy(),
    }
)

During training I get an error about the mismatch in the size of the padding mask and my node/edge labels:

ValueError: Dimensions must be equal, but are 16098 and 17 for '{{node binary_crossentropy/weighted_loss/Mul}} = Mul[T=DT_FLOAT](binary_crossentropy/Mean, binary_crossentropy/weighted_loss/Squeeze)' with input shapes: [16098], [17].

So, how can I "broadcast" the padding mask to fit the shapes of both my node and edge labels? Perhaps there's more that I'm doing incorrectly than just the mask bit. Thank you very much in advance!

arnoegw commented 4 months ago

Thanks for reaching out.

First off, let me clarify that tfgnn.keras.layers.Readout simply reads out a feature from an entire node set or edge set (akin to GraphTensor.node_sets[node_set_name][feature_name]), so you're trying to make a prediction for every node and every edge of the respective node/edge set. – Nothing wrong with that, just checking.

Padding works by adding one or more components to the graph (with nodes and edges as necessary to achieve the given sizes). Correspondingly, the padding mask has shape [num_components]. The features of node set "segments" have a different shape, say, [num_segment_nodes, ...], so what the Input Pipeline guide wants you to do is

nodes_mask = tfgnn.broadcast_context_to_nodes(graph, "segments", feature_value=mask)
edges_mask = tfgnn.broadcast_context_to_edges(graph, "beta-skeleton", feature_value=mask)
return graph, (node_labels, edge_labels), (nodes_mask, edges_mask)

PS: I'm not 100% sure right now Keras will handle the matching between a dict of losses, a list of outputs and a tuple of weights; please check carefully.

frytoli commented 4 months ago

Thank you for your detailed response! You are correct correct that I'm trying to make a prediction for every node and every edge of my input graphs, and your response makes sense. Also, after some research I changed my loss/weighted_metric values to be arrays of functions and I believe it to be working.

I'm now encountering some unexpected results when training, specifically large mismatches in loss and weighted_metric values for the same function BinaryCrossentropy(). Is this normal behavior? I see again in tfgnn's Input Pipeline docs mention of using weighted_metric instead of metric "so that the mask takes effect not just for the loss but also for the metrics." Are my node/edge masks being taken into account during evaluation of loss? If the mask is considered during evaluation of loss, do you have an idea of where else this discrepancy may be originating from?

Here's my new preprocessing function per your guidance:

def merge_and_split(graph):
    graph = graph.merge_batch_to_components()  # merged the input batch to components of one contiguously indexed graph
    graph, mask = tfgnn.keras.layers.PadToTotalSizes(size_constraints)(graph)  # batch padding
    # readouts
    readout = tfgnn.keras.layers.Readout(feature_name="label")
    node_labels = readout(graph, node_set_name="segments")
    edge_labels = readout(graph, edge_set_name="beta-skeleton")
    # broadcast padding mask from context to node/edge labels
    node_mask = tfgnn.broadcast_context_to_nodes(graph, "segments", feature_value=mask)
    edge_mask = tfgnn.broadcast_context_to_edges(graph, "beta-skeleton", feature_value=mask)
    # split labels from graph
    graph = graph.remove_features(
        node_sets={"segments": ["label"]},
        edge_sets={"beta-skeleton": ["label"]},
    )
    assert "label" not in graph.node_sets["segments"].features
    assert "label" not in graph.edge_sets["beta-skeleton"].features

    # return
    return graph, (node_labels, edge_labels), (node_mask, edge_mask)

My model compilation:

model.compile(
    tf.keras.optimizers.SGD(learning_rate= 0.0002, momentum=0.9),
    loss = [
        tf.keras.losses.BinaryCrossentropy(),
        tf.keras.losses.BinaryCrossentropy(),
    ],
    weighted_metrics = [
        tf.keras.metrics.BinaryCrossentropy(),
        tf.keras.metrics.BinaryCrossentropy(),
    ]
)

And here is an example training epoch (1) result -- notice the difference between loss and the weighted metrics. As training progresses the loss and metric values do get smaller (better), but because loss is initially so low I'm concerned that the model may not be being penalized for padding values and therefore be learning that it can get by, by predicting 0 in most cases:

4216/4216 [==============================] - ETA: 0s - loss: 0.0694 - nodes_loss: 0.0386 - edges_loss: 0.0308 - nodes_binary_crossentropy: 0.6478 - edges_binary_crossentropy: 0.6910 
arnoegw commented 4 months ago

That does indeed look strange. However, this is really a question about the behavior of Keras Model.fit(), not TF-GNN. Could it be that weighted metrics are a weighted average (i.e., divided by total weight) while the loss is a weighted sum divided by the number of terms (i.e, training examples)? (That how I'd compute the loss in each step to preserve the relative weighting of training examples between batches, but I'm just speculating here.)

I'd encourage you to find out what Keras does, perhaps by trying a much simpler model (constant predictions), and report an issue there and/or here if any adjustments should be made in the documentation or behavior of either library. For general how-to questions, please use the StackOverflow tags [keras] and [tensorflow_gnn]

frytoli commented 4 months ago

Sounds good! I appreciate your suggestions and help - thanks!