Closed dhorka closed 5 years ago
Exactly.
Ah! Yes, the pooling map in ecc
is obtained by a nearest neighbor search on the coarsened positions, not based on voxel affiliation like in PyG.
We can add a flag in order to be able to choose between both algorithms?
from torch_geometric.nn.pool.consecutive import consecutive_cluster
from torch_geometric.utils import scatter_
from torch_cluster import nearest
cluster = voxel_grid(
data.pos,
data.batch,
self.pool_rad,
start=data.pos.min(dim=0)[0] - self.pool_rad * 0.5,
end=data.pos.max(dim=0)[0] + self.pool_rad * 0.5)
cluster, perm = consecutive_cluster(cluster)
new_pos = scatter_('mean', data.pos, cluster)
new_batch = data.batch[perm]
cluster = nearest(data.pos, new_pos, data.batch, new_batch)
data.x = scatter_('max', data.x, cluster)
data.pos = new_pos
data.batch = new_batch
return data
This computes new node features like it is done in the ecc
implementation.
We can not simply add a flag to max_pool
to achieve this. We need to add our own implementation of this operator, or simply let users define this implementation on their own.
Thanks! I will check it!
These are the reuslts after check:
ecc features conv1: (993, 16) pygeometric features conv1: (993, 16) Max difference between features of conv1 7.1525574e-07 ecc features conv2: (993, 32) pygeometric features conv2: (993, 32) Max difference between features of conv2 2.3841858e-07 Output of ecc pooling: (257, 32) Output of PyGeometric pooling: torch.Size([28, 32]) Sum max_pool ecc 2773.065 Sum max_pool pygeometric 328.0657 Difference between features of max_pool1 2444.9993 Size positions ecc: (257, 3) Size positions pyg: (28, 3) Sum positions ecc: 7565.66983967046 Sum positions pyg: 841.835113007279 Difference positions: 6723.834726663181
It seems like is not achieving the same results, right? The code used as a max_pool is this one:
class GraphMaxPooling(torch.nn.Module):
def __init__(self, pool_rad):
super(GraphMaxPooling, self).__init__()
self.pool_rad = pool_rad
self.graph = T.Compose([T.KNNGraph(9, loop=True), T.Cartesian(norm=False, cat=True)])
def forward(self, data):
cluster = voxel_grid(data.pos, data.batch, self.pool_rad,
start=data.pos.min(dim=0)[0] - self.pool_rad * 0.5,
end=data.pos.max(dim=0)[0] + self.pool_rad * 0.5)
cluster, perm = consecutive_cluster(cluster)
new_pos = scatter_('mean', data.pos, cluster)
new_batch = data.batch[perm]
cluster = nearest(data.pos, new_pos, data.batch, new_batch)
data.x = scatter_('max', data.x, cluster)
data.pos = new_pos
data.batch = new_batch
data.edge_attr = None
data = self.graph(data)
return data
ECC Weights and PyGeometric weights are equal: True
Loading weights
Starting validation:
ecc features conv1: (997, 16)
pygeometric features conv1: (997, 16)
Max difference between features of conv1 4.7683716e-07
ecc features conv2: (997, 32)
pygeometric features conv2: (997, 32)
Max difference between features of conv2 3.5762787e-07
Output of ecc pooling: (398, 32)
Output of PyGeometric pooling: torch.Size([47, 32])
Sum max_pool ecc 4054.4414
Sum max_pool pygeometric 509.8744
Difference between features of max_pool1 3544.567
Size positions ecc: (398, 3)
Size positions pyg: (47, 3)
Sum positions ecc: 16834.863076248246
Sum positions pyg: 1998.5325911267726
Difference positions: 14836.330485121474
Pygeomtric Acc: 63.65638766519823 Ecc accuracy: 63.65638766519823
Pygeomtric Loss: 0.9878960048312128 Ecc Loss: 0.9878960125771908
I'm not sure if the print statements are buggy, but accuracy and loss are the same for me.
You are right, prints are buggy. The hook it's taking the value of the last max_pooling I do not know why. Thank you very much!
I would like to ask if you can explain me this better.
Ah! Yes, the pooling map in ecc is obtained by a nearest neighbor search on the coarsened positions, not based on voxel affiliation like in PyG.
I don't see the difference. When you are talking about voxel affiliation you mean all the pixels that are inside the voxel, right? With these points are estimated the new features and new positions, right? Isn't it the same that say that you are using the nearest neighbors?
Thanks,
Not necessarily, imagine two neighboring voxels with two resulting mean superpoints marked as x
(the resulting coarsened points) and an outlier o
:
+-----+-----+
|x | |
| o|x |
+-----+-----+
PyG max_pool
pools the o
into the left voxel, whereas ecc
pools the o
into the right voxel (because it is nearer to the superpoint in the right voxel).
Oh I see!! Then for each point you find the nearest "superpoint" right?
Yes, we basically create a new cluster
vector based on superpoint positions and initial point cloud.
Then if I understood properly the whole process is:
You create the voxels using the voxel_grid. After that you estimate the "super point" for each voxel and then for each point you assign the nearest super point. Right?
What I am missing at this moment is the purpose of this function consecutive_cluster(cluster)
.
Yes. The voxel_grid
method creates non-consecutive cluster ids, e.g., [1, 5, 1, 5, 10]
. The consecutive_cluster
method redefines cluster ids to [0, 1, 0, 1, 2]
so we can use basic scatter ops for pooling and coarsening.
And the perm variable what is its value? Because consecutive_cluster
is returning two values.
It has quite a strange name :D It holds one original idx
for each cluster idx
and is therefore used for filtering the batch indices.
I see, thanks!! I think we can close the issue =D One last question, can I find the last version of torch_cluster and torch_geometric in pip repositories?
Cool :) torch-cluster
is released in PyPi, and PyG will follow soon.
Hi, I put here this problem because I am not sure if it is related or not. All the things that we are done here was working perfectly on the test that I provided. However in train phase I get this error after 30 epochs:
File "/home/venv/graph/lib/python3.6/site-packages/torch_geometric-1.3.0-py3.6.egg/torch_geometric/nn/glob/glob.py", line 47, in global_mean_pool
# Patterns ending with a slash should match only directories
File "/home/venv/graph/lib/python3.6/site-packages/torch_geometric-1.3.0-py3.6.egg/torch_geometric/utils/scatter.py", line 28, in scatter_
File "/home/venv/graph/lib/python3.6/site-packages/torch_scatter/mean.py", line 68, in scatter_mean
out = scatter_add(src, index, dim, out, dim_size, fill_value)
File "/home/venv/graph/lib/python3.6/site-packages/torch_scatter/add.py", line 72, in scatter_add
src, out, index, dim = gen(src, index, dim, out, dim_size, fill_value)
File "/home/venv/graph/lib/python3.6/site-packages/torch_scatter/utils/gen.py", line 17, in gen
index = index.view(index_size).expand_as(src)
RuntimeError: shape '[5543, 1]' is invalid for input of size 5544
srun: error: gpic10: task 0: Exited with exit code 1
=====================================================
In the training phase I am doing data augmentation, in this data augmentation I am doing a dropout of points, that means in each epoch there are different graphs of different sizes, also, in the same batch there are different sizes of graphs. I am not getting this error if I use the max_pool and cluster with default start and end provided in PyG, moreover the error is trigerred in the global_mean_pool... that seems weird to me.. Do you have any clue that what can be happenning?
Try adding the size
argument to the global_mean_pool
op.
I tried but.. it does not solve the issue.
Moreover, I executed an experiment that uses max_pool + cluster with the new start and end and global_mean_pool is working properly. I think the issue is related to the code new code used to calculate the max_pool.
data.x = scatter_('max', data.x, cluster, dim_size=new_pos.size(0))
Oh I see, thats solves the problem. But I do not understand why..
It seems that in rare cases, newly computed clusters via nearest neighbor result in a different amount of clusters produced, e.g., there can be empty clusters, hence new_x and new_pos have different shape.
It seems that in rare cases, newly computed clusters via nearest neighbor result in a different amount of clusters produced, e.g., there can be empty clusters, hence new_x and new_pos have a different shape.
We can know which cluster is empty? I mean, it should be better to suppress the empty cluster. Is it always the last one?
No its not, that's just the case when it crashes. You cannot suppress empty clusters, because officially they are there (they have a point in space), just with a zero representation.
But, if any previous node is assigned to this cluster (using the nearest algorithm I mean), this new point doesn't have any feature and I think it is not interesting to have this node on my graph. How can I know which node has no representation? Because far as I understood the dim_size is adding 0 padding to the end of the feature vector, right?
I guess you need to recompute pos_new
and batch_new
(untested) based on the new cluster
:
cluster = nearest(data.pos, new_pos, data.batch, new_batch)
data.x = scatter_('max', data.x, cluster)
data.pos = scatter_('mean', data.pos, cluster)
data.batch = torch.scatter(0, cluster, data.batch)
The calculation of new data.batch it is not working:
TypeError: scatter() received an invalid combination of arguments - got (int, Tensor, Tensor), but expected one of:
data.batch.new_empty(data.pos.size(0).scatter_(0, cluster, data.batch)
should do the trick.
cluster size:, torch.Size([993])
new batch size: torch.Size([993])
x size: torch.Size([257, 32])
new pos size: torch.Size([257, 3])
These are the outputs of the size of each tensor. As you can see new_batch is not properly generated .. It is due to the fact that cluster size is 993.
The new data.batch
has the shape of data.pos.size(0)
, which should be 257
.
def forward(self, data):
cluster = voxel_grid(data.pos, data.batch, self.pool_rad,
start=data.pos.min(dim=0)[0] - self.pool_rad * 0.5,
end=data.pos.max(dim=0)[0] + self.pool_rad * 0.5)
cluster, perm = consecutive_cluster(cluster)
new_pos = scatter_('mean', data.pos, cluster)
new_batch = data.batch[perm]
cluster = nearest(data.pos, new_pos, data.batch, new_batch)
data.x = scatter_('max', data.x, cluster)
data.pos = scatter_('mean', data.pos, cluster)
data.batch.new_empty(data.pos.size(0)).scatter_(0, cluster, data.batch)
print("cluster size:, ", cluster.size())
print("new batch size: ",data.batch.size())
print(" x size: ",data.x.size())
print(" new pos size: ",data.pos.size())
#data.pos = new_pos
#data.batch = new_batch
data.edge_attr = None
#data.edge_attr = None
#data = max_pool(cluster, data)
data = self.graph(data)
return data
This is the code that I am using, but I got a new batch with size 993
data.batch = data.batch.new_empty(data.pos.size(0)).scatter_(0, cluster, data.batch)
:D
Oh! you are complety right xDD After that I was getting this error:
RuntimeError: invalid argument 3: Index tensor must not have larger size than input tensor, but got index [993] input [257]
I modified the code:
data.batch = data.batch.new_empty(data.pos.size(0)).scatter_(0, torch.unique(cluster), data.batch)
And then seems to work. But I get this error with radius_graph:
row, col = radius(x, x, r, batch, batch, max_num_neighbors + 1)
File "/work/test_modelnet_ecc_pygeometric/env/pytorch11_geometric/lib/python3.6/site-packages/torch_cluster-1.4.2-py3.6-linux-x86_64.egg/torch_cluster/radius.py", line 61, in radius
max_num_neighbors)
RuntimeError: scan failed to synchronize: device-side assert triggered
Same error with KNN:
File "work/test_modelnet_ecc_pygeometric/env/pytorch11_geometric/lib/python3.6/site-packages/torch_cluster-1.4.2-py3.6-linux-x86_64.egg/torch_cluster/knn.py", line 113, in knn_graph
row, col = knn(x, x, k if loop else k + 1, batch, batch)
File "/work/test_modelnet_ecc_pygeometric/env/pytorch11_geometric/lib/python3.6/site-packages/torch_cluster-1.4.2-py3.6-linux-x86_64.egg/torch_cluster/knn.py", line 57, in knn
return torch_cluster.knn_cuda.knn(x, y, k, batch_x, batch_y)
RuntimeError: scan failed to synchronize: device-side assert triggered
No, you shouldn't use a unique
call there. I do not understand the first error though, because before calling scatter_
, data.batch
and cluster
should have equal shape.
Sorry, it was a mistake in my code I found it. Now I do not need the torch.unique. But I am obtaining this error:
File "work/test_modelnet_ecc_pygeometric/env/pytorch11_geometric/lib/python3.6/site-packages/torch_cluster-1.4.2-py3.6-linux-x86_64.egg/torch_cluster/knn.py", line 113, in knn_graph
row, col = knn(x, x, k if loop else k + 1, batch, batch)
File "/work/test_modelnet_ecc_pygeometric/env/pytorch11_geometric/lib/python3.6/site-packages/torch_cluster-1.4.2-py3.6-linux-x86_64.egg/torch_cluster/knn.py", line 57, in knn
return torch_cluster.knn_cuda.knn(x, y, k, batch_x, batch_y)
RuntimeError: scan failed to synchronize: device-side assert triggered
The code used:
def forward(self, data):
cluster = voxel_grid(data.pos, data.batch, self.pool_rad,
start=data.pos.min(dim=0)[0] - self.pool_rad * 0.5,
end=data.pos.max(dim=0)[0] + self.pool_rad * 0.5)
cluster, perm = consecutive_cluster(cluster)
new_pos = scatter_('mean', data.pos, cluster)
new_batch = data.batch[perm]
cluster = nearest(data.pos, new_pos, data.batch, new_batch)
data.x = scatter_('max', data.x, cluster)
data.pos = scatter_('mean', data.pos, cluster)
data.batch = data.batch.new_empty(data.pos.size(0)).scatter_(0, cluster, data.batch)
print("cluster size:, ", cluster.size())
print("new batch size: ",data.batch.size())
print(" x size: ",data.x.size())
print(" new pos size: ",data.pos.size())
#data.pos = new_pos
#data.batch = new_batch
data.edge_attr = None
#data.edge_attr = None
#data = max_pool(cluster, data)
data = self.graph(data)
return data
Makes sense, you sadly need to call consecutive_cluster
a second time due to possibly empty clusters :(
cluster = voxel_grid(
data.pos,
data.batch,
self.pool_rad,
start=data.pos.min(dim=0)[0] - self.pool_rad * 0.5,
end=data.pos.max(dim=0)[0] + self.pool_rad * 0.5)
cluster, perm = consecutive_cluster(cluster)
new_pos = scatter_('mean', data.pos, cluster)
new_batch = data.batch[perm]
cluster = nearest(data.pos, new_pos, data.batch, new_batch)
cluster, perm = consecutive_cluster(cluster)
data.x = scatter_('max', data.x, cluster)
data.pos = scatter_('mean', data.pos, cluster)
data.batch = data.batch[perm]
Hmmm, I think I am not understanding properly the consecutive_cluster function. The thing is, Why is it not the same put the consecutive_cluster between:
cluster = nearest(data.pos, new_pos, data.batch, new_batch)
#cluster, perm = consecutive_cluster(cluster)
data.x = scatter_('max', data.x, cluster)
cluster, perm = consecutive_cluster(cluster)
data.pos = scatter_('mean', data.pos, cluster)
data.batch = data.batch[perm]
I asking that because I was thinking to do only this extra calculation in case of data.x.size(0) != new_pos.size(0)
Mh, that doesn‘t make sense. Either recompute node positions based on new cluster assignments, or allow zero feature representations for nodes. The thing is, the size mismatch is not a good condition for detecting empty clusters, because it does only detect a single empty cluster (the one at the end), and nothing else.
@dhorka @rusty1s could you please send me the script to reproduce ecc by torch_geometric?
As I mentioned in #319 I have problems to reproduce the ecc implemenation using pytorch_geometric. I found some differences between the results obtained, first one is that the results of both convolution operations using the same weights have different results. Moreover, the results of the pooling layers are also different.
I created a test that checks this things. Basically, the scripts load the same weights to both implementations. These weights are obtained from train a network using the ecc_implementation. Below you can see the output of my test.
As you can observe this difference has an impact to the accuracy using the same weights. You can find the source code here. One important thing, the data used for this tests is obtained from the original code of the ecc.