Closed zexinyang closed 3 years ago
Hi,
Great work! Very rigorous analysis. I will try to help you but I do not have access to my work machine during the confinement and therefore cannot run experiments myself.
1) For RGB-less clouds, there no way for the network to distinguish road and grass. If anything, I am very surprised that the model 1,4,5 an 6 were able to figure it out!
What I think is going on is that the road/grass, being very planar and horizontal, are in huge superpoints. Hence, there must be only a dozen of them or so in the test set, and a single error can ruin the IoU.
Here are some leads to better understand what is going on:
Here are some leads to improve the results:
2) The IoU and Oacc fluctuate wildly because there are massive superpoints of road/grass which are indistinguishable. I think you have the right idea with the valdiation set, provided it is chosen wisely. What are the scenes in your train/valid/test split ?
Also I cannot see the validation performance in your plots. Using the val set has helped me a lot to stabilize the performance on S3DIS/vKITTI (but I haven't tried it for sema3d yet).
3) this is very surprising to me. Can you check line 98 of spg that the edge feats are indeed a column of 1 with the 'constant' parameter?
Note: class 7 is artifact and not necessarily pedestrians, alsthough they do cause artifacts.
Hi Loic,
Sorry for my delay in getting back to you. I highly appreciate all your suggestions! There is no need for you to run experiments by yourself. It would be great if you could keep pointing out the possible issues as well as giving some suggestions.
I have tried all your suggestions: removing the jittering, shrinking the superpoints by concatenating xyz * 0.02, and training with RGB values. It did help, but the training curves remain unstable. Is it normal? Is it because of the random subgraph strategy?
Here is my training/validation/test split:
Yes, edge_feats is a column of 1.0 with the 'constant' parameter.
There is an instability due to the fact that the loss operates on superpoints, and is unaware of the consequence of its decisions (ie the size of the considered superpoints), which can be enormous.
A fix (that we didn't keep in the original paper because it decreased the mIoU on S3DIS) would be to weight each term of the cross-entropy corresponding to a superpoint by its size, ie number of points (normalizing by the total size of course). It should be fairly straightforward, this size is given by segm_size_cpu
. Just change the loss to something in the tune of:
(logit.index_select(-1, targets[:,0]+1e-7) * segm_size_cpu).sum() / segm_size_cpu.sum()
disclaimer: this exact line won't work because some tensors needs to be put on the GPU.
Another thing that you could do would be to merge the classes grass and road, especially in the RGB-less setting. To have such prominent indistinguishable classes really must make learning harder.
Thanks, it does make sense! I'll revise the loss function and let you know the results. Do you have any idea why NoEdgeFeat models perform quite well?
Hi Loic,
I get your idea of weighting each term of the cross-entropy by the size of superpoints, but I'm not sure I understand your code here. You mean to drop the cross_entropy and replace the loss (line 205 and 256 in main.py) with
loss = outputs.index_select(-1, label_mode) * segm_size).sum() / segm_size.sum()
?
It seems that the index_select here causes index out of bounds exception.
Merging the road and grass classes should improve the performance to a great extent, but I need to distinguish them, according to my task as well as my data. After shrinking superpoints and removing the jittering, I trained 8 (colored) models (without the loss fix) and found that 4 of them perform much poorer with meanIoUs under 58%. From the confusion matrixes below, it looks like the (latter two) bad models can't classify correctly bush and scape points. Have I done something wrong? meanIoU 72.1%, 56.2%, 52.7%
This is strange, I haven't observed this behaviour myself.
If you plot the test performance at each epoch, is it that the models end up on an unlucky epoch, or that it get stuck in a bad local minima?
Hi Loic,
I merged the road and grass classes (= ground) and trained several RGB models using the same hyperparameters. The trained models perform better (72-78% mIoU) and stable, but there are still a few (1 out of 10) bad models. These models did end up at an unlucky epoch, but it seems that the IoUs of some classes (e.g. car) can't converge. For the wildly and irregularly fluctuating mIoU curve, I'm not sure early-stopping works because the best model on the validation set could perform poorly on the test set. A good model with mIoU 73.4% A bad one with mIoU 57.3%
I also tried your loss fix (weighting each term by the size of each superpoint), but the model isn't trainable due to the extremely imbalanced superpoint size. Unfortunately, using --loss_weights didn't help. Besides, I was wondering if the unstable training curves generate because of the loss function. If so, why do the test curves remain stable at the same time?
Hi,
this is a really high quality thread, thanks to both of you! I was wondering if perhaps the effect of batch normalization might be responsible for the fluctuations at the end of training to some degree (the running stats are independent of learning rate). One might try putting the following after https://github.com/loicland/superpoint_graph/blob/ssp%2Bspg/learning/main.py#L179:
if epoch >= args.epochs - 10:
def set_eval(m):
if isinstance(m, (nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d)):
m.eval()
model.apply(set_eval)
The snippet requires passing epoch
to train()
: https://github.com/loicland/superpoint_graph/blob/ssp%2Bspg/learning/main.py#L176 as def train(epoch):
and https://github.com/loicland/superpoint_graph/blob/ssp%2Bspg/learning/main.py#L329 as acc, loss, oacc, avg_iou = train(epoch)
.
Hi Loic,
I truly enjoyed your paper and have played with your code for quite a while (mostly training SPG models from scratch on Semantic3D). Thank you so much for sharing it :) Recently, I got stuck in some weird situations. I'd appreciate some help.
1. Some trained models can't distinguish between road and grass. I ran the partition code on Semantic3D (your split of 11/4) once and evaluated the partition performance (perfect predictions) by assigning each superpoint its majority label. From the scores of model 0 and the visual partition result, we can make sure that the partition part works well. Based on the same partition results, I trained several models from scratch using your latest source code without any modifications. Specifically, I trained 6 models without RGB using the same hyperparameter settings (
s1
). However, two of the trained models perform much poorer (with mIoUs around 55%) than the others (with mIoUs near 70%). We can see that these two models are unable to separate road and grass points from both the per class IoUs (model 2&3) and the visual prediction results (the misclassification appears in the sg27_4 scan). One of the RGB models (model 8) I trained from scratch (s2
) also suffered from this issue. Is it because of the multisample strategy or the random seeds? How can I avoid this?scores on sema3d(11/4)
_the partition and classification results of sg274
hyperparameter setting (noted that the 4 validation scans are stored in the testfull folder):
2. Why do the pointwise curves (mIoU & oAcc) fluctuate wildly? To overcome the issue mentioned above, I've tried early-stopping (
--use_val_set '1'
) on a custom split of Semantic3d: 9/3/3 training/validation/test scans. Unfortunately, it didn't help since the pointwise mIoU curve fluctuates irregularly, which means we can't guarantee the saved model, the best on the validation set, perform well on the test set. If I usecount_predicted_batch_hard
instead ofcount_predicted_batch
to construct a superpoint-wise confusion matrix, these curves become relatively stable, but I can't evaluate the model on raw points in this way. Any suggestions to save the best model?pointwise mIoU and oAcc curves (row 2&3) on your split of 11/4 scans
superpoint-wise curves
3. The NoEdgeFeat models perform unexpectedly well? I was interested in the performance of the NoEdgeFeat model on Semantic3d, so I trained two models without any super-edge information (one with and one without color) by setting
--edge_attribs 'constant'
. I got a surprise when I saw the evaluation scores (see model 9&10): there are no differences between models with and without super-edge features, which is opposite to your ablation study on S3DIS (table 5 in your SPG paper). Any ideas?Many thanks!