mdeff / cnn_graph

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
https://arxiv.org/abs/1606.09375
MIT License
1.34k stars 390 forks source link

laplacian diagonal elements for fake nodes #18

Open maosi-chen opened 6 years ago

maosi-chen commented 6 years ago

Hello Michaël,

My colleague told me when I use the graph.laplacian function, I have to assign 0.0 to the diagonal elements of the L matrix on the fake nodes. The current function will return 1.0 on those elements.

My colleague said on fake nodes weights W = 0.0 so d will be 0.0. d+=np.spacing(0.0) will have a value very close to 0.0 (e.g. 4.9e-324). d = 1 / np.sqrt(d) will have a large value (e.g. 4.5e+161). D*W*D will be 0.0 (b/c 0.0 times a large number (but not infinity) is 0.0). L=I-D*W*D will be 1.0.

The original Laplacian definition L=D-W tell us the original L will have 0.0 on fake nodes (b/c D and W both are 0.0 on them). Normalizing (squashing) the original L by D shouldn't change the value 0.0 to 1.0.

Is my colleague right on this?

Thanks. Maosi Chen

maosi-chen commented 6 years ago

Here is our revised graph.laplacian function

def laplacian_r1(W, normalized=True):
    """Return the Laplacian of the weigth matrix."""

    # Degree matrix.
    #d = W.sum(axis=0)
    orig_d = W.sum(axis=0)

    fake_node_mask = np.where(orig_d<1e-30, 0.0, 1.0)
    fake_node_mask.astype(dtype=W.dtype)
    fake_node_mask = np.squeeze(fake_node_mask)

    d = orig_d

    # Laplacian matrix.
    if not normalized:
        D = scipy.sparse.diags(d.A.squeeze(), 0)
        L = D - W
    else:
        d += np.spacing(np.array(0, W.dtype))
        d = 1 / np.sqrt(d)
        tmp_d = d.A.squeeze()
        if (tmp_d.size > 1):
            D = scipy.sparse.diags(tmp_d, 0)
        else:
            D = scipy.sparse.diags(np.array([tmp_d]), 0)
        I = scipy.sparse.identity(d.size, dtype=W.dtype)
        L = I - D * W * D

    # for elements where D_ii=0, D*W*D_ii are set to 1.0 and L_ii are set to 0.0.
    # correct 0 * Inf -> 1.0    
    L_diag_old = L.diagonal()
    L_diag_new = np.multiply(fake_node_mask, L_diag_old)
    L.setdiag(L_diag_new)

    # assert np.abs(L - L.T).mean() < 1e-9
    assert type(L) is scipy.sparse.csr.csr_matrix
    return L
mdeff commented 6 years ago

Hi Maosi,

Regarding definitions, you're correct that the diagonal of the combinatorial Laplacian L = D - W is zero on the fake nodes. It is however undefined for the normalized Laplacian L = I - D^{1/2} W D^{1/2}, because of the division by zero.

The d += np.spacing(np.array(0, W.dtype)) is a choice I made to avoid the division by zero and set a value of 1 on those nodes. Zero would be another choice. A one will preserve the value at the node when diffusing (with y = Lx) and filtering (with polynomials of L), while a zero will kill any value. In general, a one is thus preferred.

For this application it should not matter as the value of a fake node should be zero anyway. But I guess this reasoning is related to your fear of propagating the bias (#17) right? And killing the value should clear any doubt.

maosi-chen commented 6 years ago

Hi Michaël,

Yes, we found that setting L to one or zero on fake nodes does not change the model performance in my application because the corresponding values in x are zero (we applied masks on x after adding bias and pooling).

For my application (spatial interpolation on dynamic nodes), currently cnn_graph shows larger loss (~0.2) than the stacked bidirectional LSTM (~0.05) after they are stable. I think maybe the change of graph (i.e. locations and numbers of nodes) across samples is the cause? Because the weights trained in cnn_graph should be applied on a fixed graph?

Thanks. Maosi

mdeff commented 6 years ago

Then it does not matter indeed.

The method was clearly developed with a fixed graph in mind, but filters can definitely be applied to multiple graphs (especially if they are local, i.e. of low polynomial order). Maybe you are destroying too much information with coarsening / pooling. I'm not sure I understand why you are doing this, but I saw in your other issue you had 7 layers, with the last graphs being mostly disconnected.