pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
84.72k stars 22.81k forks source link

[Feature request] create sparse coo matrix w/o index check #14945

Open jermainewang opened 5 years ago

jermainewang commented 5 years ago

🚀 Feature

Currently, creating a sparse coo matrix requires checking all the provided indices are within bound. However, this is costly and sometimes not needed (e.g. when the sparse matrix is from a graph adj). Request to provide a way to create sparse coo matrix w/o index check.

Motivation

The recent v1.0 update breaks an API in the DGL project. We are a library for deep learning on graphs. We built it atop of framework like Pytorch. Sparse matrix is critical in implementing efficient message passing algorithm. We found that creating a sparse matrix in pytorch is slow due to the unnecessary index check (our graph construction guarantees that all indices will be within bound). As a result, we use an internal API _sparse_coo_tensor_unsafe in the v0.4 time, which is removed in v1.0. It will be great if pytorch could have such support.

Pitch

Add a flag in sparse_coo_matrix interface called validate_index. It default to be True, but when set False, the index check will be skipped.

Alternatives

No alternatives. The internal API has been removed.

cc @aocsa @nikitaved @pearu @mruberry @VitalyFedyunin @ngimel

soumith commented 5 years ago

DGL looks really nice. I attended the tutorial at NYU last week. Sorry for breaking your project, but as you mentioned, you were using an internal API :)

We can definitely introduce a validate_index in the constructor of sparse_coo_matrix. cc: @weiyangfb

However, this will not go live in binaries until 1.0.1 or 1.1. Do you have a workaround currently for this to unblock DGL working in pytorch 1.0?

jermainewang commented 5 years ago

Thanks @soumith . I'm glad you like DGL:) . Unfortunately, I cannot think about an alternative but use sparse_coo_matrix for torch 1.0 (https://github.com/dmlc/dgl/pull/282). It will influence the speed of models that need to create adjacency matrices on the fly (such as those with many small graph samples, or models with dynamic graphs). We will really appreciate it if you could include this feature in a 1.0.1 patch.

jermainewang commented 5 years ago

Also, here are some concrete numbers about the performance influence using my old K20 GPU. Under 0.4.1.post2, one epoch of our GCN example takes 0.0109s Under 0.1.0, if switched to sparse_coo_matrix, one epoch of our GCN example takes 0.0125s. That's about 15% slower sadly.

ssnl commented 5 years ago

There is a workaround using _indices, _values, _coalesced_, and set_. But it is fairly hacky and unsafe, so I don’t know if I should recommend it...

soumith commented 5 years ago

@SsnL it's going to be in an internal routine for DGL, so it seems like it's fine to recommend it. See: https://github.com/dmlc/dgl/pull/282/files

ssnl commented 5 years ago

Okay, here you go @jermainewang

>>> def sp(index, data, size, coalesced=False):
...     t = torch.sparse_coo_tensor(size=size, device=index.device, dtype=data.dtype)
...     t._indices().set_(index)
...     t._values().set_(data)
...     t._coalesced_(coalesced)
...     return t
...
>>>
>>>
>>>
>>> sp(torch.arange(4).reshape(2, 2), torch.randn(2, 3, 3), [10, 10, 3, 3], True)
tensor(indices=tensor([[0, 1],
                       [2, 3]]),
       values=tensor([[[ 0.2279, -2.4789,  0.8396],
                       [-0.7172,  1.1474, -0.3711],
                       [ 0.7415, -0.5742,  1.1273]],

                      [[ 0.9689, -0.2353, -1.4202],
                       [-1.1111,  0.1783, -0.2180],
                       [-0.9308,  0.5249, -0.1092]]]),
       size=(10, 10, 3, 3), nnz=2, layout=torch.sparse_coo)
>>> sp(torch.arange(4).reshape(2, 2).cuda(1), torch.randn(2, 3, 3).cuda(1).double(), [10, 10, 3, 3], True)
tensor(indices=tensor([[0, 1],
                       [2, 3]]),
       values=tensor([[[-0.0173, -1.0477,  0.3912],
                       [-0.7270,  1.6093, -0.7889],
                       [ 1.2058,  0.8473, -1.3683]],

                      [[-1.1343, -0.3197,  0.5396],
                       [ 0.2474, -1.3241, -1.7152],
                       [ 1.0082, -2.2226,  0.9491]]]),
       device='cuda:1', size=(10, 10, 3, 3), nnz=2, dtype=torch.float64,
       layout=torch.sparse_coo)
ssnl commented 5 years ago

:) I should really finish the sparse docs some time.

jermainewang commented 5 years ago

@SsnL Is this compatible with autograd? The set_ seems to be an inplace operation.

soumith commented 5 years ago

yes it should be fine with autograd.

ssnl commented 5 years ago

@jermainewang Depends on what you mean by "compatible with autograd".. If you mean "whether the results of this can be used in autograd as a leaf collecting gradient or as a gradient of some variable", then yes, it can. If you mean "whether you can backprop the output sparse tensor to the input data tensor", then no. However, the later case is only enabled by a patch after 0.4.1 so I doubt you mean it.

jermainewang commented 5 years ago

Thanks for the explanation. Support for backprop through data tensor is very useful for models like graph attention network. We actually planned to support this by custom ops in the 0.4.1 time. Good to know that this is finally supported in v1.0!

jermainewang commented 5 years ago

@SsnL I tested your code a little bit and I'm confused with the coalesced flag. Here is an example:

import torch
import numpy as np

def sp(index, data, size, coalesced=False):
     t = torch.sparse_coo_tensor(size=size, device=index.device, dtype=data.dtype)
     t._indices().set_(index)
     t._values().set_(data)
     t._coalesced_(coalesced)
     return t

N = 10

# (i, i+1) pos is one, (i, i) pos is one.
row = torch.cat([torch.arange(N-1), torch.arange(N)])
col = torch.cat([torch.arange(N-1) + 1, torch.arange(N)])
index = torch.cat([row.unsqueeze(0), col.unsqueeze(0)])
data = torch.ones((len(row),))
size = (N, N)

spmat = sp(index, data, size, True)
print(spmat.to_dense())
X = torch.arange(N).unsqueeze(1).float()
Y = torch.spmm(spmat, X)
print(Y)

np_spmat = spmat.to_dense().numpy()
np_X = X.numpy()
np_Y = np.dot(np_spmat, np_X)
print(np_Y)

The sparse matrix I created here is as follows:

tensor([[1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 1., 1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 1., 1., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 1., 1.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

Based on the explanation in the doc, the indices contain no duplicate entries, so they should be coalesced. However, the result is wrong after a spmm. What I got is things like:

# torch's result
tensor([[45.],
        [ 1.],
        [ 2.],
        [ 3.],
        [ 4.],
        [ 5.],
        [ 6.],
        [ 7.],
        [ 8.],
        [ 9.]])
# numpy's result (also the correct result)
[[ 1.]
 [ 3.]
 [ 5.]
 [ 7.]
 [ 9.]
 [11.]
 [13.]
 [15.]
 [17.]
 [ 9.]]
weiyangfb commented 5 years ago

Hi @jermainewang , I think you code only works with the coalesce() call:

import torch
import numpy as np

def sp(index, data, size, coalesced=False):
     t = torch.sparse_coo_tensor(size=size, device=index.device, dtype=data.dtype)
     t._indices().set_(index)
     t._values().set_(data)
     t._coalesced_(coalesced)
     return t

N = 10

# (i, i+1) pos is one, (i, i) pos is one.
row = torch.cat([torch.arange(N-1), torch.arange(N)])
col = torch.cat([torch.arange(N-1) + 1, torch.arange(N)])
index = torch.cat([row.unsqueeze(0), col.unsqueeze(0)])
data = torch.ones((len(row),))
size = (N, N)

spmat = sp(index, data, size).coalesce()
print(spmat.to_dense())
X = torch.arange(N).unsqueeze(1).float()
Y = torch.spmm(spmat, X)
print(Y)

np_spmat = spmat.to_dense().numpy()
np_X = X.numpy()
np_Y = np.dot(np_spmat, np_X)
print(np_Y)

The reason is the sparse tensor you created wasn't really coalesced, the definition of coalesce is:

jermainewang commented 5 years ago

I see. Perhaps you should put the definition in the document. Another general question is your suggested best practice about coalescing. Shall we always coalesce indices when creating the sparse matrix? Is there any performance difference between uncoalesced spmm and coalesced one?

weiyangfb commented 5 years ago

@jermainewang That's a good question! When using sparse ops, you don't have to worry about coalescing as the ops will take care of both of coalesced and uncoalesced cases. For performance, in general coalesce() is not cheap! So calling a sparse op on a coalesced sparse tensor is almost always faster than a uncoalesced one if the sparse op requires the coalesce invariance.

And yes, let me add the coalesce definition to the sparse page.

jermainewang commented 5 years ago

Thanks for the explanation. Now I understand. I will leave this issue open if you guys want to refer to when implementing the features.

KiddoZhu commented 4 years ago

I got the same feature request when creating dynamic sparse matrices on-the-fly.

In my case, I need to instantiate a sparse matrix of shape 25,000 25,000, with nnz = 25,000,000. The matrix is only used once in a SPMM operator. Here is my benchmark on two operators. 1) Create the COO tensor: 1.09s 2) SPMM (25,000 25,000 & 25,000 * 64): 0.03s

I think the reason is that checking the index is very slow and it is done on CPU. Given that my indices and values are already valid and placed on GPU, it is redundant to do this check.

The easiest workaround is to have a direct API to at::_sparse_coo_tensor_unsafe in Python. Can you guys add it in Python interface later?

KiddoZhu commented 4 years ago

@SsnL It looks like your workaround doesn't work in 1.4.0. The sparse tensor is not modified.

aocsa commented 3 years ago

@SsnL It looks like your workaround doesn't work in 1.4.0. The sparse tensor is not modified.

There is already a direct API to at::_sparse_coo_tensor_unsafe in Python through torch._sparse_coo_tensor_unsafe.
I have rewritten the sample code before using this function, as you can see we are getting the right results. So if this is good enough I think this issue should be closed. Otherwise, I can add the validate_index parameter to the sparse_coo_tensor API as it is suggested initially.

Looking forward your comments @KiddoZhu, @ngimel, @rgommers

# 14945
import torch
import numpy as np

def sp(index, data, size, coalesced=False):
     t = torch._sparse_coo_tensor_unsafe(index, data, size=size, device=index.device, dtype=data.dtype)
     t._coalesced_(coalesced)
     return t

N = 10

# (i, i+1) pos is one, (i, i) pos is one.
row = torch.cat([torch.arange(N-1), torch.arange(N)])
col = torch.cat([torch.arange(N-1) + 1, torch.arange(N)])
index = torch.cat([row.unsqueeze(0), col.unsqueeze(0)])
data = torch.ones((len(row),))
size = (N, N)

spmat = sp(index, data, size, True)
print(spmat.to_dense())
X = torch.arange(N).unsqueeze(1).float()
Y = torch.spmm(spmat, X)
print(Y)

np_spmat = spmat.to_dense().numpy()
np_X = X.numpy()
np_Y = np.dot(np_spmat, np_X)
print(np_Y)
tensor([[1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 1., 1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 1., 1., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 1., 1.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
tensor([[ 1.],
        [ 3.],
        [ 5.],
        [ 7.],
        [ 9.],
        [11.],
        [13.],
        [15.],
        [17.],
        [ 9.]])
[[ 1.]
 [ 3.]
 [ 5.]
 [ 7.]
 [ 9.]
 [11.]
 [13.]
 [15.]
 [17.]
 [ 9.]]
ngimel commented 3 years ago

We should either rename _sparse_coo_tensor_unsafe to not have leading underscore, and document it, or add validate argument. Either way, it should be a documented user-facing function.

KiddoZhu commented 3 years ago

@aocsa Exactly. The problem is that the API only exists in old PyTorch versions. I am using 1.4.0 and there is no way to access this function.

Now I solve this problem by a custom JIT module exposing at::_sparse_coo_tensor_unsafe to Python.

mruberry commented 3 years ago

I think we would accept a PR implementing a "validate" argument.

KiddoZhu commented 3 years ago

Great. I can do that.