Open baurst opened 2 years ago
Hi @baurst, thank you for your interest! Here are some step-by-step suggestions to check the reproduction:
KITTI-SF pre-processing
python data_prepare/kittisf/process_kittisf.py ${KITTI_SF}
I've uploaded our processed data to: https://www.dropbox.com/s/vpibaeu1yx1kpeg/kittisf.zip?dl=0. You can compare your version with it. The data processing is deterministic and this step should produce exactly the same results.
Scene flow estimation on KITTI-SF
python test_flow_kittisf.py config/flow/kittisf/kittisf_unsup.yaml --split train --test_model_iters 5 --save
python test_flow_kittisf.py config/flow/kittisf/kittisf_unsup.yaml --split val --test_model_iters 5 --save
Here are our outputs:
# --split train
Evaluation on kittisf-train: {'EPE': 0.12648308038711548, 'AccS': 0.5746897603012622, 'AccR': 0.7554271269217133, 'Outlier': 0.3829704918945208}
Evaluation on kittisf-val: {'EPE': 0.15136219955515118, 'AccS': 0.6428256083279849, 'AccR': 0.7688812624663115, 'Outlier': 0.3612103173363721}
This step is also deterministic and should produce exactly the same results.
3. **KITTI-SF downsampling**
python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled --predflow_path flowstep3d
I've uploaded our downsampled data to: https://www.dropbox.com/s/r2lq98afy61u6de/kittisf_downsampled.zip?dl=0.
You can compare your version with it. Again, this step is deterministic and should produce exactly the same results.
4. **Train segmentation - Round 1**
python train_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --round 1 python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round 1 --test_batch_size 4 --save python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round 1 --test_batch_size 4 --save
Here are our outputs for the OA-ICP algorithm:
Original flow: {'EPE': 0.1221223770081997, 'AccS': 0.551214599609375, 'AccR': 0.7276544189453125, 'Outlier': 0.433809814453125} Weighted Kabsch flow: {'EPE': 0.12027331572026015, 'AccS': 0.520142822265625, 'AccR': 0.7200775146484375, 'Outlier': 0.451470947265625} Object-Aware ICP flow: {'EPE': 0.022878126529976724, 'AccS': 0.943924560546875, 'AccR': 0.9632830810546875, 'Outlier': 0.2250048828125}
Original flow: {'EPE': 0.14206457536667585, 'AccS': 0.6343487548828125, 'AccR': 0.765782470703125, 'Outlier': 0.34766357421875} Weighted Kabsch flow: {'EPE': 0.144663465321064, 'AccS': 0.5925579833984375, 'AccR': 0.743780517578125, 'Outlier': 0.3667724609375} Object-Aware ICP flow: {'EPE': 0.0639616659656167, 'AccS': 0.8458380126953124, 'AccR': 0.88419189453125, 'Outlier': 0.17696044921875}
Besides, you can also take a look at your segmentation results after round 1:
python test_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round 1 --test_batch_size 2
Evaluation on kittisf-val: AveragePrecision@50: 0.26585884114941416 PanopticQuality@50: 0.21743871392992076 F1-score@50: 0.32455795677799604 Prec@50: 0.23817762399077277 Recall@50: 0.5092478421701603 {'per_scan_iou_avg': 0.5011032064259052, 'per_scan_iou_std': 0.018945805206894876, 'per_scan_ri_avg': 0.521771999001503, 'per_scan_ri_std': 0.0036116865277290343}
In this step, due to the [non-deterministic PyTorch operators](https://pytorch.org/docs/stable/notes/randomness.html) in training, you may not reproduce exactly the same values even with fixed random seed. However, the segmentation and scene flow improvement results should be close to ours. The **scene flow improvement (Object-Aware ICP flow)** results on training split are especially important, as the segmentation in round 2 depends on it.
(By the way, the "Original flow" in the outputs of OA-ICP should be exactly the same, otherwise the previous downsampling step is not correct.)
5. **Train segmentation - Round 2**
python train_seg.py config/seg/kittisf/kittisf_unsup.yaml --round 2
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round 2 --test_batch_size 2
If you have reproduced the same results in step 1-3 and similar results in step 4, in this step your results should be close to ours reported in Table 3 of the paper.
After you fixed the training on KITTI-SF, the testing on KITTI-Det and SemanticKITTI are also expected to work fine.
Hope these can help you!
Thank you so much for taking the time to investigate this and uploading your results. The resulting data and metrics including step 2 are pretty much identical with the output that I am getting.
For step 3 I got different results from you (compared np.allclose(...)
to the downsampled data you have uploaded to dropbox.) I rebuild the PointNet stuff and reran the pipeline and now the data seems to equal for step 3, so that's good news. The only explanation I have at the moment is that maybe the setup.py script of the PointNet2 extensions pulled a different CUDA version than my conda PyTorch uses and it unfortunately did not lead to a crash, but failed silently maybe? Not sure. It seems to be fixed now. Thank you again for your help.:)
I am now rerunning the pipeline and will report back with new results, but I am reasonably confident that this could have been the issue.
Hi, thank you very much for all your help, it is very much appreciated!
After retraining the whole thing, I got the following results:
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round 1 --test_batch_size 2 --save
Original flow: {'EPE': 0.12211721856147051, 'AccS': 0.5512310791015625, 'AccR': 0.7276416015625, 'Outlier': 0.4337994384765625}
Weighted Kabsch flow: {'EPE': 0.12256817236542701, 'AccS': 0.5357562255859375, 'AccR': 0.72188720703125, 'Outlier': 0.444791259765625}
Object-Aware ICP flow: {'EPE': 0.026448437687940897, 'AccS': 0.94170654296875, 'AccR': 0.9593218994140625, 'Outlier': 0.2265020751953125}
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round 1 --test_batch_size 2 --save
Original flow: {'EPE': 0.14206901295110583, 'AccS': 0.634364013671875, 'AccR': 0.76574951171875, 'Outlier': 0.34768310546875}
Weighted Kabsch flow: {'EPE': 0.1433044557645917, 'AccS': 0.6090081787109375, 'AccR': 0.7522802734375, 'Outlier': 0.3580340576171875}
Object-Aware ICP flow: {'EPE': 0.06135747742839157, 'AccS': 0.859599609375, 'AccR': 0.8907073974609375, 'Outlier': 0.1681353759765625}
python test_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round 1 --test_batch_size 2
Evaluation on kittisf-val:
AveragePrecision@50: 0.3040004303378626
PanopticQuality@50: 0.22565737308702502 F1-score@50: 0.3401574803149606 Prec@50: 0.2498554077501446 Recall@50: 0.5326757090012331
{'per_scan_iou_avg': 0.5131771497428417, 'per_scan_iou_std': 0.017402197439223527, 'per_scan_ri_avg': 0.5475729809701443, 'per_scan_ri_std': 0.004504442065954208}
So this looks all good, for Round 1 the segmentation result is even better than the one you have reported in the post above.
Here it get's a bit weird to me:
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round 2 --test_batch_size 2 --save
Original flow: {'EPE': 0.026448437687940897, 'AccS': 0.94170654296875, 'AccR': 0.9593218994140625, 'Outlier': 0.2265020751953125}
Weighted Kabsch flow: {'EPE': 0.09985332327196375, 'AccS': 0.7451837158203125, 'AccR': 0.81739501953125, 'Outlier': 0.334610595703125}
Object-Aware ICP flow: {'EPE': 0.08384263168089091, 'AccS': 0.8469573974609375, 'AccR': 0.898587646484375, 'Outlier': 0.2753076171875}
Is it expected that the original flow is much better than the Weighted Kabsch flow and Object-Aware ICP flow? I think this contradicts your statement: By the way, the "Original flow" in the outputs of OA-ICP should be exactly the same, otherwise the previous downsampling step is not correct.
I'm not sure what I am doing wrong. It's probably best I delete all intermediate results and start again from the beginning.
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round 2 --test_batch_size 2 --save
Loaded weights from /mnt/ssd4/selfsupervised_OD/ogc/ckpt_out/seg/kittisf/kittisf_unsup_woinv_R2/best.pth.tar
Original flow: {'EPE': 0.06135747742839157, 'AccS': 0.859599609375, 'AccR': 0.8907073974609375, 'Outlier': 0.1681353759765625}
Weighted Kabsch flow: {'EPE': 0.10845257709734142, 'AccS': 0.7518304443359375, 'AccR': 0.8445501708984375, 'Outlier': 0.2459100341796875}
Object-Aware ICP flow: {'EPE': 0.08291638159193099, 'AccS': 0.87177490234375, 'AccR': 0.895157470703125, 'Outlier': 0.168624267578125}
Here the difference is not that big.
Results after training using python train_seg.py config/seg/kittisf/kittisf_unsup.yaml --round 2
:
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split train --round 2 --test_batch_size 2
AveragePrecision@50: 0.5843461014413663
PanopticQuality@50: 0.43964455737869956 F1-score@50: 0.5475504322766571 Prec@50: 0.4418604651162791 Recall@50: 0.7196969696969697
{'per_scan_iou_avg': 0.721262679696083, 'per_scan_iou_std': 0.020856219343986595, 'per_scan_ri_avg': 0.9482875975966454, 'per_scan_ri_std': 0.0019339963793754578}
That looks very good!
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round 2 --test_batch_size 2
AveragePrecision@50: 0.44320909604001196
PanopticQuality@50: 0.3760641999234322 F1-score@50: 0.47724974721941354 Prec@50: 0.40445586975149955 Recall@50: 0.5819975339087546
{'per_scan_iou_avg': 0.6221984243392944, 'per_scan_iou_std': 0.014200684800744056, 'per_scan_ri_avg': 0.92194748878479, 'per_scan_ri_std': 0.002544737458229065}
This is 10% less than you have reported in the paper, indicating I must have made a mistake somewhere. Thanks again for your help!
Just to be sure, I run the experiment pipeline like this. Am I missing something critical?
KITTI_SF="/mnt/ssd4/selfsupervised_OD/ogc/kitti_sf"
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
export CUDA_VISIBLE_DEVICES=1
python data_prepare/kittisf/process_kittisf.py ${KITTI_SF}
python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled
python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled --predflow_path flowstep3d
for ROUND in $(seq 1 2); do
python train_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --round ${ROUND}
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round ${ROUND} --test_batch_size 2 --save
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round ${ROUND} --test_batch_size 2 --save
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split train --round ${ROUND} --test_batch_size 2
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round ${ROUND} --test_batch_size 2
done
# ROUND will remain as 2 here
python train_seg.py config/seg/kittisf/kittisf_unsup.yaml --round ${ROUND}
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split train --round ${ROUND} --test_batch_size 2
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round ${ROUND} --test_batch_size 2
python train_seg.py config/seg/kittisf/kittisf_unsup.yaml --round ${ROUND}
would run twice with ROUND==2, that does not seem right - I should delete the last 3 commands in the script, right?
Hi @baurst , thanks for your feedback!
In your reported results, the SF-train and SF-val of Round 2 is not needed. In round1, we train the segmentation and improve the scene flow; In round 2, we only train the segmentation (with improved flow) and report it as final segmentation results. So your experiment pipeline is expected to be:
KITTI_SF="/mnt/ssd4/selfsupervised_OD/ogc/kitti_sf"
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
export CUDA_VISIBLE_DEVICES=1
python data_prepare/kittisf/process_kittisf.py ${KITTI_SF}
python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled
python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled --predflow_path flowstep3d
# ROUND = 1 (No loop here! Only run ROUND=1; Use "kittisf_unsup_woinv.yaml" for training and testing)
python train_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --round ${ROUND}
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round ${ROUND} --test_batch_size 2 --save
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round ${ROUND} --test_batch_size 2 --save
python test_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round ${ROUND} --test_batch_size 2
python test_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round ${ROUND} --test_batch_size 2
# ROUND = 2
python train_seg.py config/seg/kittisf/kittisf_unsup.yaml --round ${ROUND}
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split train --round ${ROUND} --test_batch_size 2
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round ${ROUND} --test_batch_size 2
You are expected to get only two trained models: kittisf_unsup_woinv_R1
and kittisf_unsup_R2
in ckpt/seg/kittisf
directory.
In your pipeline, what make me confusing is
for ROUND in $(seq 1 2); do
python train_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --round ${ROUND}
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round ${ROUND} --test_batch_size 2 --save
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round ${ROUND} --test_batch_size 2 --save
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split train --round ${ROUND} --test_batch_size 2
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round ${ROUND} --test_batch_size 2
done
The python test_seg.py config/seg/kittisf/kittisf_unsup.yaml
will visit ckpt/seg/kittisf_unsup_R{ROUND}
, which should not exist at this stage. Because you perform the training via python train_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml
and can only got ckpt/seg/kittisf_unsup_woinv_R{ROUND}
. So maybe intermediate results from previous runs are used? It might be a good idea to delete all current results and train from the beginning.
If you have fixed the experiment pipeline, there might be another reason for the failure in reproduction. As you can see, in testing we load the model with lowest validation loss: https://github.com/vLAR-group/OGC/blob/3afbf55159a795b8e483602dceedb4315817da43/test_seg.py#L80-L83 However, this may not lead to the model with best performance. I occasionally met such case before, as shown below:
A quick solution is to load the model from the final epoch: https://github.com/vLAR-group/OGC/blob/3afbf55159a795b8e483602dceedb4315817da43/test_seg.py#L76-L79 Or you can save model from different epochs during the training and select from them: https://github.com/vLAR-group/OGC/blob/3afbf55159a795b8e483602dceedb4315817da43/train_seg.py#L212-L217
Training log on Tensorboard can also help you debug~ I've reproduced with different random seeds and we can always got a ~50 F1 score and ~40 PQ score:
Thank you very much for your help and detailed explanation! I will delete the intermediate results and try again. :)
I did not know that the experiment has tensorboard support! I never could find any tensorboad logs, so I assumed there is no tensorboard logging active. But I found out that the summaries are not written because the log_dir did not exist for me and thus no tensorboard files could be written. I created a PR #3 enabling the creation of the log_dir prior to running so that others can have the tensorboard as well.
Hi,
thank you for publishing the code to your very interesting paper!
Could you please kindly look at my steps that I did to try to reproduce the results in the paper? Clearly I must be doing something wrong, but I cannot figure it out, since there are a lot of steps involved. Thank you very much in advance for taking a look. Your help is very much appreciated!
Here is how I adapted the experiment (mainly data and save paths) to my machine:
config/flow/kittisf/kittisf_unsup.yaml
config/seg/kittidet/kittisf_unsup.yaml
config/seg/kittisf/kittisf_sup.yaml
config/seg/kittisf/kittisf_unsup.yaml
config/seg/kittisf/kittisf_unsup_woinv.yaml
config/seg/semantickitti/kittisf_unsup.yaml
After this, I did the following steps:
For the last command I am getting: AveragePrecision@50: 0.3241964006222572 PanopticQuality@50: 0.2567730165763252 F1-score@50: 0.35737439222042144 Prec@50: 0.26614363307181654 Recall@50: 0.5437731196054254 {'per_scan_iou_avg': 0.5634193836152553, 'per_scan_iou_std': 0.020407961700111627, 'per_scan_ri_avg': 0.6674587628245354, 'per_scan_ri_std': 0.00429959088563919}
I am getting: AveragePrecision@50: 0.13945170257439435 PanopticQuality@50: 0.1318724309223011 F1-score@50: 0.19702186647587533 Prec@50: 0.13796774698606545 Recall@50: 0.3444609491048393 {'per_scan_iou_avg': 0.45250289306404357, 'per_scan_iou_std': 0.0, 'per_scan_ri_avg': 0.4861106249785733, 'per_scanri std': 0.0}
AveragePrecision@50: 0.10315215577576131 PanopticQuality@50: 0.0989709766834506 F1-score@50: 0.15591615175838772 Prec@50: 0.10372148859543817 Recall@50: 0.31385 31283601174 {'per_scan_iou_avg': 0.4351089967498311, 'per_scan_iou_std': 0.0, 'per_scan_ri_avg': 0.4129963953279687, 'per_scan_ri_s td': 0.0}
Am I doing something fundamentally wrong? Thanks again for taking a look!