pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.51k stars 3.69k forks source link

Bug in voxel grid generation #319

Closed dhorka closed 5 years ago

dhorka commented 5 years ago

Hi,

I have one doubt regarding the voxel grid. In some of my point clouds this error is triggered:

File "/home/dhorka/phd/code/test_pool.py", line 251, in forward                                                                                   
    data = avg_pool(cluster, data)
  File "/home/dhorka/venv/pytorch11_geometric/lib/python3.6/site-packages/torch_geometric/nn/pool/avg_pool.py", line 61, in avg_pool                                                       
    index, attr = pool_edge(cluster, data.edge_index, data.edge_attr)
  File "/home/dhorka/venv/pytorch11_geometric/lib/python3.6/site-packages/torch_geometric/nn/pool/pool.py", line 11, in pool_edge                                                          
    num_nodes)
  File "/home/dhorka/venv/pytorch11_geometric/lib/python3.6/site-packages/torch_sparse/coalesce.py", line 29, in coalesce                                                                  
    _, perm = unique(row * n + col)
  File "/home/dhorka/venv/pytorch11_geometric/lib/python3.6/site-packages/torch_sparse/utils/unique.py", line 12, in unique                                                                
    out, perm = torch_sparse.unique_cuda.unique(src)
RuntimeError: cuda runtime error (9) : invalid configuration argument at /home/dhorka/libs/pytorch/aten/src/THC/generic/THCTensorMathReduce.cu:293                                      

In order to replicate the error in a more controlled environment I found that, this error is triggered when the voxel grid acts as a global pooling , it is only one grid required to aggroup all the nodes. The following code is what I used:

cluster = voxel_grid(data.pos, data.batch, 100, start = -10e7,end = 10e7)
 data.edge_attr = None                                                                                                                                      
 data = avg_pool(cluster, data)

Is there anything that I am doing wrong? Is it intended this behaviour? I know that there is a specific method to go a global pooling, but it is not my intention to do a global pooling, but some times due to the density of my point cloud, only one grid is required.

Thanks

rusty1s commented 5 years ago

I will look into it.

dhorka commented 5 years ago

I think there is a bug in the voxel grid generation. Because I compared the results obtained by the method used in the original ecc code and the resulting groups are different. I attached the code that I used to generate the voxel grid.

def voxel_gridopen3d(graph, res_down):
    src_cloud = open3d.PointCloud()
    src_cloud.points = open3d.Vector3dVector(graph.pos.detach().cpu().numpy())

    # down_sampling

    dst_cloud = open3d.voxel_down_sample(src_cloud, voxel_size=res_down)

    dst_kd = open3d.KDTreeFlann(dst_cloud)

    ind = [dst_kd.search_knn_vector_3d(src_cloud.points[i], 1)[1] for i in range(len(src_cloud.points))]

    return torch.tensor(ind, dtype=torch.long).t().squeeze().to(graph.pos.device) 
rusty1s commented 5 years ago

Hi, the error occurred due to an empty edge_index tensor after coarsening. I fixed it in master (although the error depends on which PyTorch version you are using).

I do not know what you mean by saying there is a bug in the voxel grid generation method. This method is well tested and is implemented differently than the code you posted above. It creates dense cluster indices which get sparsified later on in avg_pool or max_pool. So there is no reason to expect that both methods output the same cluster values.

dhorka commented 5 years ago

Well I expected to have the same output because in the readme of the repo is said that the voxel_grid_pooling is the implementation of the operation proposed in: Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs. The code I posted is extracted from the official repo of the paper, for this reason I am expecting the same result.

rusty1s commented 5 years ago

Well, cluster indices do not mean much and can vary based on implementation (e.g. different order of dimensions), but rather the coarsened graph/pooled node features should be equal. Isn't this the case?

dhorka commented 5 years ago

It is not the case, both implementations have different results. Moreover, I found that the results of the nnconv with the parameters properly set to have the same behaviour as ecc convolution does not have the same results as the original implementation of ecc. In a couple of hours I can share with you the code that I am using to compare the results of both implementations.