Closed shayan-nikoo closed 1 year ago
I tested this in a CPU-only docker environment but the error persists.
I could skip the error and run the training by passing the class_weights: []
as empty in the config file. However, the result of training and test don't look right to me, the mIoU is very low. The test result is:
Overall Testing Accuracy : 0.108, mIoU : 0.058
is mIoU almost zero or am I missing something?
INFO - 2023-02-10 17:32:09,648 - semantic_segmentation - Loss train: 1.693 eval: 1.692
INFO - 2023-02-10 17:32:09,648 - semantic_segmentation - Mean acc train: 0.174 eval: 0.267
INFO - 2023-02-10 17:32:09,648 - semantic_segmentation - Mean IoU train: 0.087 eval: 0.148
INFO - 2023-02-10 17:32:09,648 - semantic_segmentation - === EPOCH 798/800 ===
training: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:01<00:00, 42.48it/s]
validation: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 79.69it/s]
INFO - 2023-02-10 17:32:11,208 - semantic_segmentation - Loss train: 1.635 eval: 1.396
INFO - 2023-02-10 17:32:11,208 - semantic_segmentation - Mean acc train: 0.151 eval: 0.352
INFO - 2023-02-10 17:32:11,208 - semantic_segmentation - Mean IoU train: 0.081 eval: 0.182
INFO - 2023-02-10 17:32:11,208 - semantic_segmentation - === EPOCH 799/800 ===
training: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:01<00:00, 42.03it/s]
validation: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 84.19it/s]
INFO - 2023-02-10 17:32:12,779 - semantic_segmentation - Loss train: 1.659 eval: 1.652
INFO - 2023-02-10 17:32:12,779 - semantic_segmentation - Mean acc train: 0.163 eval: 0.267
INFO - 2023-02-10 17:32:12,779 - semantic_segmentation - Mean IoU train: 0.086 eval: 0.109
INFO - 2023-02-10 17:32:12,780 - semantic_segmentation - === EPOCH 800/800 ===
training: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:01<00:00, 39.32it/s]
validation: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 75.36it/s]
INFO - 2023-02-10 17:32:14,463 - semantic_segmentation - Loss train: 1.726 eval: 2.458
INFO - 2023-02-10 17:32:14,463 - semantic_segmentation - Mean acc train: 0.169 eval: 0.226
INFO - 2023-02-10 17:32:14,463 - semantic_segmentation - Mean IoU train: 0.086 eval: 0.072
INFO - 2023-02-10 17:32:14,552 - semantic_segmentation - Epoch 800: save ckpt to ./logs/KPFCNN_S3DIS_torch/checkpoint
NFO - 2023-02-10 18:15:06,649 - semantic_segmentation - Accuracy : [0.21310890361166648, 0.3248931736760441, 0.5463872361721226, 0.0, 0.0, 0.0, 0.0, 0.0011374866459067771, 0.0, 0.0, 0.0, 0.0, 0.3207777518363869, 0.10817727322631744]04.97it/s]
INFO - 2023-02-10 18:15:06,649 - semantic_segmentation - IoU : [0.1751161378342312, 0.23549265987632134, 0.25156889849491443, 0.0, 0.0, 0.0, 0.0, 0.0010764818849628398, 0.0, 0.0, 0.0, 0.0, 0.09091394402481037, 0.05801293247040308]
INFO - 2023-02-10 18:15:06,658 - s3dis - Saved Area_3_office_6 in ./test/S3DIS/Area_3_office_6.npy.
test 21/23: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 53173/53173 [00:01<00:00, 44756.97it/s]
test 22/23: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13400/13400 [00:00<00:00, 119477.80it/s]INFO - 2023-02-10 18:15:06,953 - semantic_segmentation - Accuracy : [0.21271639007411344, 0.3281782025759704, 0.5457846251452136, 0.0, 0.0, 0.0, 0.0, 0.0011374866459067771, 0.0, 0.0, 0.0, 0.0, 0.32096158074793024, 0.1083675603991642]
INFO - 2023-02-10 18:15:06,953 - semantic_segmentation - IoU : [0.1746411967770861, 0.23722240797212982, 0.2534656981039331, 0.0, 0.0, 0.0, 0.0, 0.0010754038786806772, 0.0, 0.0, 0.0, 0.0, 0.09036740278810888, 0.05821323919384143]
INFO - 2023-02-10 18:15:06,953 - s3dis - Saved Area_3_hallway_6 in ./test/S3DIS/Area_3_hallway_6.npy.
INFO - 2023-02-10 18:15:06,954 - semantic_segmentation - Overall Testing Accuracy : 0.1083675603991642, mIoU : 0.05821323919384143
INFO - 2023-02-10 18:15:06,954 - semantic_segmentation - Finished testing
I trained the model on S3DIS/Stanford3dDataset_v1.2
not the aligned version. Max_epochs
is 800.
Does anyone else also get this poor results?
I have the same problem with you and I remedy this issu by modifying the SemSegLoss in semseg_loss.py (from line 40). I don't know why the previous code will return a 2-d array by 'DataProcessing.get_class_weights(dataset.cfg.class_weights)'. Actually dataset.cfg.class_weights is a 1-d array and we just need it. You can have a try by modifying the site-package 'site-packages/open3d/_ml3d/torch/modules/semseg_loss.py'.
class SemSegLoss(object):
"""Loss functions for semantic segmentation."""
def __init__(self, pipeline, model, dataset, device):
super(SemSegLoss, self).__init__()
# weighted_CrossEntropyLoss
if 'class_weights' in dataset.cfg.keys() and len(
dataset.cfg.class_weights) != 0:
#class_wt = DataProcessing.get_class_weights(
# dataset.cfg.class_weights)
class_wt = dataset.cfg.class_weights
weights = torch.tensor(class_wt, dtype=torch.float, device=device)
self.weighted_CrossEntropyLoss = nn.CrossEntropyLoss(weight=weights)
else:
self.weighted_CrossEntropyLoss = nn.CrossEntropyLoss()
Thank you @srzxDragon. This fix worked. Now the code runs without passing empty class_weights. But the results are still very poor. mIoU: 0.10
INFO - 2023-02-22 11:14:22,952 - semantic_segmentation - Loss train: 1.231 eval: 1.247
INFO - 2023-02-22 11:14:22,952 - semantic_segmentation - Mean acc train: 0.175 eval: 0.237
INFO - 2023-02-22 11:14:22,953 - semantic_segmentation - Mean IoU train: 0.101 eval: 0.123
INFO - 2023-02-22 11:14:23,090 - semantic_segmentation - Epoch 800: save ckpt to ./logs/KPFCNN_S3DIS_torch/checkpoint
Is your training results with kpconv also like this?
I didn't train kpconvon S3DIS while I trained randlanet on semantic3d. But I found the similiar issue with you that the performance is not good as the official results. My results is as follows, but the official results get 76.0 on mIoU. I have no idea about this issue.
hi i meet the same problem, but beside this there is another warning like follows
UserWarning: An output with one or more elements was resized since it had shape [180224], which does not match the required output shape [4, 45056]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:17.) return torch.stack(batch, 0, out=out) do you have met the problem
I don't remember exactly, but I don't think I encountered this warning. You can see my error log in the post.
Let's see the get_class_weights
function:
@staticmethod
def get_class_weights(num_per_class):
# pre-calculate the number of points in each category
num_per_class = np.array(num_per_class, dtype=np.float32)
weight = num_per_class / float(sum(num_per_class))
ce_label_weight = 1 / (weight + 0.02)
return np.expand_dims(ce_label_weight, axis=0)
So is there anyone know why np.expand_dims
is needed?
I changed it to the following:
return ce_label_weight
It works.
I don't remember exactly, but I don't think I encountered this warning. You can see my error log in the post.
thank you for your reply
Did you guy find the solution for the low IoU? @shayan-nikoo @srzxDragon
Unfortunately not. I think it is an implementation issue by Open3d-ml because I am using everything as their settings.
Checklist
master
branch).Describe the issue
I am trying to train the kp-conv pytorch semantic segmentation on S3DIS dataset but keep getting
RuntimeError: weight tensor should be defined either for all 13 classes or no classes but got weight tensor of shape: [1, 13]
. First, I noticed pickles are not created for all files. Preprocessing stops at Area5/office18. I fixed below line and the preprocessing works fine.Steps to reproduce the bug
This is the command I run:
The config file is the same as
ml3d/configs/kpconv_s3dis.yml
I only added two parameters to thepipeline
config file:Error message
Expected behavior
Run the trainnig on S3DIS with no error.
Open3D, Python and System information
Additional information