`torch.sparse.mm` breaking API changes

migalkin commented 2 years ago

Suddenly, everything stopped working 😱 presumably because of the changes to torch.sparse. Particularly, I am on PyTorch 1.10, master branch of PyKEEN and torch-ppr 0.0.5.

Problem 1: the allclose() check does not pass now: https://github.com/mberr/torch-ppr/blob/921898f1a4b7770e6cdd1931e935262e456eb3c9/src/torch_ppr/utils.py#L221-L222

MWE:

import torch
from torch_ppr import page_rank

from pykeen.datasets import FB15k237

dataset = FB15k237(create_inverse_triples=False)
edges = dataset.training.mapped_triples[:, [0, 2]].t()
pr = page_rank(edge_index=torch.cat([edges, edges.flip(0)], dim=-1), num_nodes=dataset.num_entities)

>> ValueError: Invalid column sum: tensor([1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]). expected 1.0

Looking into the debugger:

adj_sum does sum up to the number of nodes
the default tolerance fails the check, but if I reduce rtol=1e-4 or atol=1e-4 - the check passes

Problem 2: the signature of torch.sparse.addmm has changed from the one used in power_iteration so the API call fails with the unknown kwarg error.

https://github.com/mberr/torch-ppr/blob/921898f1a4b7770e6cdd1931e935262e456eb3c9/src/torch_ppr/utils.py#L310

In fact, I can't find where those kwargs input, sparse, dense come from because the current signature has less readable mat, mat1, mat2. I traced to the very Torch 1.3.0 and still can't find where those originated from. Where does this signature come from? 😅

My test env

torch                 1.10.0
torch-ppr             0.0.5

mberr commented 2 years ago

On the current master and with

torch              1.11.0+cu113

the code runs through - though even there, the signature seems to be mat, mat1, mat2 :sweat_smile:

cthoyt commented 2 years ago

i've never set this up before but can we get the unit tests to run on multiple versions of pytorch?

mberr commented 2 years ago

yes, cf. https://tox.wiki/en/latest/config.html#generating-environments

mberr commented 2 years ago

I am not sure whether I want to support multiple version though, in particular as the sparse API is currently (1.11) declared beta :sweat_smile:

cthoyt commented 2 years ago

okay up to you

mberr commented 2 years ago

@migalkin , for the first problem: If you do not provide a starting value yourself, it will initialize to https://github.com/mberr/torch-ppr/blob/921898f1a4b7770e6cdd1931e935262e456eb3c9/src/torch_ppr/utils.py#L260 which should sum up to 1. So I guess this is a numerical problem here.

mberr commented 2 years ago

I played around with it, and the the error seemed to occur when there is large number of non-zero entries in one column. This is intuitive, since the more values we sum, the more error can accumulate.

In natural graphs with hub nodes, this can occur.

mberr / torch-ppr

`torch.sparse.mm` breaking API changes #11