Closed mirceta closed 5 years ago
Hi,
labels become vectors of 3 components, and have a larger range of values (up to 76), whereas they were before just 0 or 1. Is this correct?
No. The class 0 is reserved for unlabelled data. if all your data is labeleled, make the first class be 1 and the second one be 2. be careful, the predicted values will be shifted by 1 (since 'unlabelled' is never predicted). It can be confusing.
Then, when pruning, the labels associated with each voxel is the histograms of point labels they contain. So 3 columns (unlabeled/class 1 / class 2) and for each the number of points. If you have values up to 76, it means you are probably usubsampling a little bit aggressively, but it really depends on your sensor (for example, if the density is variable).
I think the class_meter is confused because it has clouds with only class 0 and thinks none are annotated. See if the aboves fixes it.
Oh I understand, I didn't know they were histograms. I pruned a lot because I wanted it to finish quick until I can get it working. After that I'll prune less.
These are the changes I made:
I also tried:
What confuses me is should I then, knowing that there is as label reserved for unlabeled - treat it as if it has 3 classes or 2? Though in this case neither of the attempts worked.
I also tried decreasing arg voxel_width in partition.py to 1, so I would not prune so much.
Keep getting the same error. Do you have any other ideas?
should be 2 classes in get_info, and f_2.
before calling loss_meter.value()
print the following:
print(loss_meter.n)
print(loss_meter.sum[0])
it seems like n will be zero but I am curious about sum.
More generally, print your prediction o_cpu
and ground truth t_cpu
at each iteration (note that they will be already shifted back to start at 0, with -100 for superpoints containing no ground truth points at all).
Correct, loss_meter.n is 0. I cannot print the predictions because the loop under #iterate over dataset in batches, is never entered. I think the loader was not loaded correctly. Perhaps something gets mixed up in line 165: trainset_dataset, test_dataset = create_dataset(args)
Though still it gets loaded, and the path is correct. It also reads the superpoint_graph files which are auto-generated so I don't know where's a chance for a mistake.
Found another curious thing -> @line 176 learning/main.py
logging.getLogger().getEffectiveLevel() > logging.DEBUG
is true, and then
loader = tqdm(loader,ncols=100)
gets executed. Is this correct?
Edit: I tried deleting the lines under the if (so if logging... and loader = tqdm...) and the result was still the same: zero division error.
I compared this to the execution of semantic3D and saw that only the number of files in the training set was different. So I add twice to trainlist in custom_dataset.py/get_datasets() and now it goes through. It seems that it was not working because all of my points were in a single file.
Though I get a new error in train(),
at
embeddings = ptcCloudEmbedder.run(model, *clouds_data)
gives me RuntimeError: Given groups=1, weight of size 64 11 1, expected input [835, 5, 128] to have 11 channels, but got 5 channels instead.
Hi,
if you only have one file for training the problem might be with the batch size. Try to reduce the batch size to 1.
I assume your data does not have rgb? If so you need to adapt --ptn_attribs
and --ptn_nfeat_stn
.
Yes, the data does not have rgb. But I already set rgb to empty list in line 158 partition.py as instructed, and set in partition --ver_batch = 1 and in main.py --ptn_attribs = xyzelpsv --ptn_nfeat_stn = 8
Still get RuntimeError: Given groups=1, weight of size 64 8 1, expected input [96,5,128] to have 8 channels, but got 5 channels instead.
After this, I set ptn_nfeat_stn to 5, and got
Traceback (most recent call last): File "/home/km/superpoint_graph/learning/main.py", line 405, in <module> main() File "/home/km/superpoint_graph/learning/main.py", line 304, in main acc, loss, oacc, avg_iou = train() File "/home/km/superpoint_graph/learning/main.py", line 200, in train embeddings = ptnCloudEmbedder.run(model, *clouds_data) File "/home/km/superpoint_graph/learning/pointnet.py", line 131, in run_full_monger out = model.ptn(Variable(clouds, volatile=True), Variable(clouds_global, volatile=True)) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/km/superpoint_graph/learning/pointnet.py", line 90, in forward T = self.stn(input[:,:self.nfeat_stn,:]) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/km/superpoint_graph/learning/pointnet.py", line 47, in forward input = self.convs(input) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 196, in forward self.padding, self.dilation, self.groups) RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
Also if it helps you, here is the output from partition.py:
=================
train/
=================
1 / 2---> data
creating the feature file...
=========================
======== pruning ========
=========================
Voxelization into 133670 x 33680 x 206 grid
Reduced from 22713486 to 429575 points (1.89%)
(429575, 3)
[[4.49018625e+05 1.21040445e+05 4.05682587e+02]
[4.49016094e+05 1.21042156e+05 4.19194794e+02]
[4.49019000e+05 1.21042766e+05 4.05488037e+02]
[4.49015844e+05 1.21042203e+05 4.05744904e+02]
[4.49015812e+05 1.21045570e+05 4.05728302e+02]]
3.0
10
45
93% done
computing the superpoint graph...
minimal partition...
L0-CUT PURSUIT WITH L2 FIDELITY
PARAMETERIZATION = SPECIAL SUPERPOINTGRAPH
Graph 429577 vertices and 10309800 edges and observation of dimension 4
computation of the SPG...
Timer : 5.0 / 21.0 / 20.2
2 / 2---> data2
creating the feature file...
=========================
======== pruning ========
=========================
Voxelization into 133670 x 33680 x 206 grid
Reduced from 22713486 to 429575 points (1.89%)
(429575, 3)
[[4.49018625e+05 1.21040445e+05 4.05682587e+02]
[4.49016094e+05 1.21042156e+05 4.19194794e+02]
[4.49019000e+05 1.21042766e+05 4.05488037e+02]
[4.49015844e+05 1.21042203e+05 4.05744904e+02]
[4.49015812e+05 1.21045570e+05 4.05728302e+02]]
3.0
10
45
95% done
computing the superpoint graph...
minimal partition...
L0-CUT PURSUIT WITH L2 FIDELITY
PARAMETERIZATION = SPECIAL SUPERPOINTGRAPH
Graph 429577 vertices and 10309800 edges and observation of dimension 4
computation of the SPG...
Timer : 9.9 / 41.8 / 40.4
=================
test/
=================
1 / 2---> data
creating the feature file...
=========================
======== pruning ========
=========================
Voxelization into 133670 x 33680 x 206 grid
Reduced from 22713486 to 429575 points (1.89%)
(429575, 3)
[[4.49018625e+05 1.21040445e+05 4.05682587e+02]
[4.49016094e+05 1.21042156e+05 4.19194794e+02]
[4.49019000e+05 1.21042766e+05 4.05488037e+02]
[4.49015844e+05 1.21042203e+05 4.05744904e+02]
[4.49015812e+05 1.21045570e+05 4.05728302e+02]]
3.0
10
45
95% done
computing the superpoint graph...
minimal partition...
L0-CUT PURSUIT WITH L2 FIDELITY
PARAMETERIZATION = SPECIAL SUPERPOINTGRAPH
Graph 429577 vertices and 10309800 edges and observation of dimension 4
computation of the SPG...
Timer : 14.8 / 62.7 / 60.5
2 / 2---> data2
creating the feature file...
=========================
======== pruning ========
=========================
Voxelization into 133670 x 33680 x 206 grid
Reduced from 22713486 to 429575 points (1.89%)
(429575, 3)
[[4.49018625e+05 1.21040445e+05 4.05682587e+02]
[4.49016094e+05 1.21042156e+05 4.19194794e+02]
[4.49019000e+05 1.21042766e+05 4.05488037e+02]
[4.49015844e+05 1.21042203e+05 4.05744904e+02]
[4.49015812e+05 1.21045570e+05 4.05728302e+02]]
3.0
10
45
95% done
computing the superpoint graph...
minimal partition...
L0-CUT PURSUIT WITH L2 FIDELITY
PARAMETERIZATION = SPECIAL SUPERPOINTGRAPH
Graph 429577 vertices and 10309800 edges and observation of dimension 4
computation of the SPG...
Timer : 19.6 / 83.4 / 80.3
Process finished with exit code 0
1) if you set the batch size to 1 (--batch_size 1
in learning/main.py
) you don't need to double your dataset.
2) can you print your model?
3) what branch/commit are you running?
Here is the model:
Module(
(ecc): GraphNetwork(
(0): RNNGraphConvModule(
(_cell): GRUCellEx(
32, 32
(ini): InstanceNorm1d(1, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
(inh): InstanceNorm1d(1, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
(ig): Linear(in_features=32, out_features=32, bias=True)
)(ingate layernorm)
(_fnet): Sequential(
(0): Linear(in_features=13, out_features=32, bias=True)
(1): ReLU(inplace)
(2): Linear(in_features=32, out_features=128, bias=True)
(3): ReLU(inplace)
(4): Linear(in_features=128, out_features=64, bias=True)
(5): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): ReLU(inplace)
(7): Linear(in_features=64, out_features=32, bias=False)
)
)
(1): Linear(in_features=352, out_features=2, bias=True)
)
(ptn): PointNet(
(stn): STNkD(
(convs): Sequential(
(0): Conv1d(5, 64, kernel_size=(1,), stride=(1,))
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Conv1d(64, 64, kernel_size=(1,), stride=(1,))
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace)
(6): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
(7): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU(inplace)
)
(fcs): Sequential(
(0): Linear(in_features=128, out_features=128, bias=True)
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Linear(in_features=128, out_features=64, bias=True)
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace)
)
(proj): Linear(in_features=64, out_features=4, bias=True)
)
(convs): Sequential(
(0): Conv1d(8, 64, kernel_size=(1,), stride=(1,))
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Conv1d(64, 64, kernel_size=(1,), stride=(1,))
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace)
(6): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
(7): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU(inplace)
(9): Conv1d(128, 128, kernel_size=(1,), stride=(1,))
(10): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): ReLU(inplace)
(12): Conv1d(128, 256, kernel_size=(1,), stride=(1,))
(13): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(14): ReLU(inplace)
)
(fcs): Sequential(
(0): Linear(in_features=257, out_features=256, bias=True)
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Linear(in_features=256, out_features=64, bias=True)
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace)
(6): Linear(in_features=64, out_features=32, bias=True)
)
)
)
1 - it won't solve the cuda error, but will solve the training loop not being executed
3 - this commit is obsolete. if you want to stay on this commit, at least revert it locally by changing the track_running_stats
to False
again.
Your input type is not on the GPU (torch.FloatTensor
and not torch.cuda.FloatTensor
). Line 127 should convert them to cuda tensors. Are you running --cuda 0
?
Good point. I was running from within pycharm and didn't prepend CUDA_VISIBLE_DEVICES=0. I have set track_running_stats
to False and added --cuda 0 (This is the same as prepending CUDA_VISIBLE_DEVICES=0 right?). The error message changes now, but it's again
Runtime error: Given groups=1, weight of size 64 8 1, expected input [48, 5, 128] to have 8 channels, but got 5 channels instead.
At the same line. Here are the args again:
--dataset custom_dataset --CUSTOM_SET_PATH /media/km/ad02048a-21c3-4454-b1b4-58c5a99df3c5/workspace --epochs 10 --lr_steps '[275,320]' --test_nth_epoch 2 --model_config gru_10,f_2 --nworkers 2 --pc_attribs xyzelpsv --odir "results" --ptn_nfeat_stn 5 --batch_size 1 --cuda 0
Also tried changing ptn_nfeat_stn
with no luck.
Also is the 2nd dimension in the input matrix the features? Seems weird that there are 5? Though the input is the superpoint graph so it must be transformed?
Tried running from terminal with CUDA_VISIBLE_DEVICES=0 too, with the same result.
Edit3: Another really curious thing: in conv.py the forward() method, where the error occurs, gets executed 3 times before the error - possible the error is somewhere in the middle of the network.
Do you have a GPU? If so you should run --cuda 1
.
At which line does the error occur?
print the size of P at the end of the function load_superpoint
in learning/spg.py
, it seems that your point clouds laods with the wrong number of columns for some reason.
Yes exactly, just found it! in spg.py, I forgot to pad the indices of e
and lpsv
by 3 to the left, because there are no rgb values. It's working now!
And yes, I have a GPU, but when I set --cuda 1 it doesn't work again. It's
untimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
Traceback (most recent call last): File "/home/km/superpoint_graph/learning/main.py", line 405, in <module> main() File "/home/km/superpoint_graph/learning/main.py", line 304, in main acc, loss, oacc, avg_iou = train() File "/home/km/superpoint_graph/learning/main.py", line 200, in train embeddings = ptnCloudEmbedder.run(model, *clouds_data) File "/home/km/superpoint_graph/learning/pointnet.py", line 131, in run_full_monger out = model.ptn(Variable(clouds, volatile=True), Variable(clouds_global, volatile=True)) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/km/superpoint_graph/learning/pointnet.py", line 90, in forward T = self.stn(input[:,:self.nfeat_stn,:]) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/km/superpoint_graph/learning/pointnet.py", line 47, in forward input = self.convs(input) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/km/anaconda3/envs/newenv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 196, in forward self.padding, self.dilation, self.groups) RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
add line 130 of pointnet.py:
print(clouds.device)
print(clouds_global.device)
Maybe try to remove the Variable since its obsolete now? What version of pytorch are you using?
Both if I put --cuda 1 or --cuda 0, the prints say device is cpu. Torch version is 1.1.0 . Which Variable do you mean?
Which Variable do you mean?
line 131 of pointnet.py
Both if I put --cuda 1 or --cuda 0, the prints say device is cpu.
Weird. Check wether the if clause line 127 of pointnet.py is entered with --cuda 1
. Either by running it in debug mode or using a print inside the clause, and printing self.args.cuda
just before line 127.
Wow, weird, you are right. Even though I put in --cuda 1, self.args.cuda will be 0. Also when I start main.py args.cuda = 1, and even when CloudEmbedder is instantiated, self.args.cuda within CloudEmbedder will be 1, but when run_full_monger method is run, it becomes 0.
Edit: Sorry, it was a line at the start of the train() loop, where I manually set it to 0 to avoid a previous error.
I get the same error as this issue now: https://github.com/loicland/superpoint_graph/issues/98 . Will let you know what happens after I fix it.
Hey, after applying your fix for the above issue it now runs on the GPU as well. Thanks for all the help Loic!
Glad to hear it!
Hello,
I have a custom dataset with no rgbs and only 2 classes. In the train() function I get a float division by zero error during confusion_matrix.get_average_intersection_union().
I found also that the loop above this mentioned call had 0 iterations. The partition is successful, but also after calling
xyz, rgb, labels = libply_c.prune(xyz, args.voxel_width, rgb, labels, n_labels)
labels become vectors of 3 components, and have a larger range of values (up to 76), whereas they were before just 0 or 1. Is this correct?
here are the arguments to all the scripts I call (partition/partition.py, learning/custom_dataset.py, main/learning.py in this order)
You can also check out https://github.com/FloatingObjectSegmentation/superpoint_graph/tree/adapt-to-mag/learning to see how to code is changed.