vLAR-group / OGC

🔥OGC in PyTorch (NeurIPS 2022 & TPAMI 2024)
Other
100 stars 7 forks source link

Questions about correct order and paths when running the full pipeline #2

Open baurst opened 2 years ago

baurst commented 2 years ago

Hi,

thank you for publishing the code to your very interesting paper!

Could you please kindly look at my steps that I did to try to reproduce the results in the paper? Clearly I must be doing something wrong, but I cannot figure it out, since there are a lot of steps involved. Thank you very much in advance for taking a look. Your help is very much appreciated!

Here is how I adapted the experiment (mainly data and save paths) to my machine:

After this, I did the following steps:

KITTI_SF="/mnt/ssd4/ogc/kitti_sf"
KITTI_DET="/mnt/ssd4/ogc/kitti_det"
SEMANTIC_KITTI="/mnt/ssd4/ogc/SemanticKITTI"

python data_prepare/kittisf/process_kittisf.py ${KITTI_SF}

python test_flow_kittisf.py config/flow/kittisf/kittisf_unsup.yaml --split train --test_model_iters 5 --save
python test_flow_kittisf.py config/flow/kittisf/kittisf_unsup.yaml --split val --test_model_iters 5 --save

python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled
python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled --predflow_path flowstep3d

python data_prepare/kittidet/process_kittidet.py ${KITTI_DET}
python data_prepare/semantickitti/process_semantickitti.py ${SEMANTIC_KITTI}

for ROUND in $(seq 1 2); do
    python train_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --round ${ROUND}
    python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round ${ROUND} --test_batch_size 2 --save
    python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round ${ROUND} --test_batch_size 2 --save
done

python train_seg.py config/seg/kittisf/kittisf_unsup.yaml --round ${ROUND}

# KITTI-SF
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round ${ROUND} --test_batch_size 2

For the last command I am getting: AveragePrecision@50: 0.3241964006222572 PanopticQuality@50: 0.2567730165763252 F1-score@50: 0.35737439222042144 Prec@50: 0.26614363307181654 Recall@50: 0.5437731196054254 {'per_scan_iou_avg': 0.5634193836152553, 'per_scan_iou_std': 0.020407961700111627, 'per_scan_ri_avg': 0.6674587628245354, 'per_scan_ri_std': 0.00429959088563919}

# KITTI-Det
python test_seg.py config/seg/kittidet/kittisf_unsup.yaml --split val --round ${ROUND} --test_batch_size 2

I am getting: AveragePrecision@50: 0.13945170257439435 PanopticQuality@50: 0.1318724309223011 F1-score@50: 0.19702186647587533 Prec@50: 0.13796774698606545 Recall@50: 0.3444609491048393 {'per_scan_iou_avg': 0.45250289306404357, 'per_scan_iou_std': 0.0, 'per_scan_ri_avg': 0.4861106249785733, 'per_scanri std': 0.0}

# SemanticKITTI
python test_seg.py config/seg/semantickitti/kittisf_unsup.yaml --round ${ROUND} --test_batch_size 2

AveragePrecision@50: 0.10315215577576131 PanopticQuality@50: 0.0989709766834506 F1-score@50: 0.15591615175838772 Prec@50: 0.10372148859543817 Recall@50: 0.31385 31283601174 {'per_scan_iou_avg': 0.4351089967498311, 'per_scan_iou_std': 0.0, 'per_scan_ri_avg': 0.4129963953279687, 'per_scan_ri_s td': 0.0}

Am I doing something fundamentally wrong? Thanks again for taking a look!

Szy-Young commented 2 years ago

Hi @baurst, thank you for your interest! Here are some step-by-step suggestions to check the reproduction:

  1. KITTI-SF pre-processing

    python data_prepare/kittisf/process_kittisf.py ${KITTI_SF}

    I've uploaded our processed data to: https://www.dropbox.com/s/vpibaeu1yx1kpeg/kittisf.zip?dl=0. You can compare your version with it. The data processing is deterministic and this step should produce exactly the same results.

  2. Scene flow estimation on KITTI-SF

    python test_flow_kittisf.py config/flow/kittisf/kittisf_unsup.yaml --split train --test_model_iters 5 --save
    python test_flow_kittisf.py config/flow/kittisf/kittisf_unsup.yaml --split val --test_model_iters 5 --save

    Here are our outputs:

    
    # --split train
    Evaluation on kittisf-train: {'EPE': 0.12648308038711548, 'AccS': 0.5746897603012622, 'AccR': 0.7554271269217133, 'Outlier': 0.3829704918945208}

--split val

Evaluation on kittisf-val: {'EPE': 0.15136219955515118, 'AccS': 0.6428256083279849, 'AccR': 0.7688812624663115, 'Outlier': 0.3612103173363721}

This step is also deterministic and should produce exactly the same results.

3. **KITTI-SF downsampling**

python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled --predflow_path flowstep3d

I've uploaded our downsampled data to: https://www.dropbox.com/s/r2lq98afy61u6de/kittisf_downsampled.zip?dl=0.
You can compare your version with it. Again, this step is deterministic and should produce exactly the same results.

4. **Train segmentation - Round 1**

python train_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --round 1 python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round 1 --test_batch_size 4 --save python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round 1 --test_batch_size 4 --save

Here are our outputs for the OA-ICP algorithm:

--split train

Original flow: {'EPE': 0.1221223770081997, 'AccS': 0.551214599609375, 'AccR': 0.7276544189453125, 'Outlier': 0.433809814453125} Weighted Kabsch flow: {'EPE': 0.12027331572026015, 'AccS': 0.520142822265625, 'AccR': 0.7200775146484375, 'Outlier': 0.451470947265625} Object-Aware ICP flow: {'EPE': 0.022878126529976724, 'AccS': 0.943924560546875, 'AccR': 0.9632830810546875, 'Outlier': 0.2250048828125}

--split val

Original flow: {'EPE': 0.14206457536667585, 'AccS': 0.6343487548828125, 'AccR': 0.765782470703125, 'Outlier': 0.34766357421875} Weighted Kabsch flow: {'EPE': 0.144663465321064, 'AccS': 0.5925579833984375, 'AccR': 0.743780517578125, 'Outlier': 0.3667724609375} Object-Aware ICP flow: {'EPE': 0.0639616659656167, 'AccS': 0.8458380126953124, 'AccR': 0.88419189453125, 'Outlier': 0.17696044921875}

Besides, you can also take a look at your segmentation results after round 1:

python test_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round 1 --test_batch_size 2

Our outputs

Evaluation on kittisf-val: AveragePrecision@50: 0.26585884114941416 PanopticQuality@50: 0.21743871392992076 F1-score@50: 0.32455795677799604 Prec@50: 0.23817762399077277 Recall@50: 0.5092478421701603 {'per_scan_iou_avg': 0.5011032064259052, 'per_scan_iou_std': 0.018945805206894876, 'per_scan_ri_avg': 0.521771999001503, 'per_scan_ri_std': 0.0036116865277290343}


In this step, due to the [non-deterministic PyTorch operators](https://pytorch.org/docs/stable/notes/randomness.html) in training, you may not reproduce exactly the same values even with fixed random seed. However, the segmentation and scene flow improvement results should be close to ours. The **scene flow improvement (Object-Aware ICP flow)** results on training split are especially important, as the segmentation in round 2 depends on it.

(By the way, the "Original flow" in the outputs of OA-ICP should be exactly the same, otherwise the previous downsampling step is not correct.)

5. **Train segmentation - Round 2**

python train_seg.py config/seg/kittisf/kittisf_unsup.yaml --round 2

python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round 2 --test_batch_size 2


If you have reproduced the same results in step 1-3 and similar results in step 4, in this step your results should be close to ours reported in Table 3 of the paper.

After you fixed the training on KITTI-SF, the testing on KITTI-Det and SemanticKITTI are also expected to work fine. 

Hope these can help you! 
baurst commented 2 years ago

Thank you so much for taking the time to investigate this and uploading your results. The resulting data and metrics including step 2 are pretty much identical with the output that I am getting.

For step 3 I got different results from you (compared np.allclose(...) to the downsampled data you have uploaded to dropbox.) I rebuild the PointNet stuff and reran the pipeline and now the data seems to equal for step 3, so that's good news. The only explanation I have at the moment is that maybe the setup.py script of the PointNet2 extensions pulled a different CUDA version than my conda PyTorch uses and it unfortunately did not lead to a crash, but failed silently maybe? Not sure. It seems to be fixed now. Thank you again for your help.:)

I am now rerunning the pipeline and will report back with new results, but I am reasonably confident that this could have been the issue.

baurst commented 2 years ago

Hi, thank you very much for all your help, it is very much appreciated!

After retraining the whole thing, I got the following results:

Round 1:

SF - Train:

python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round 1 --test_batch_size 2 --save
Original flow: {'EPE': 0.12211721856147051, 'AccS': 0.5512310791015625, 'AccR': 0.7276416015625, 'Outlier': 0.4337994384765625}
Weighted Kabsch flow: {'EPE': 0.12256817236542701, 'AccS': 0.5357562255859375, 'AccR': 0.72188720703125, 'Outlier': 0.444791259765625}
Object-Aware ICP flow: {'EPE': 0.026448437687940897, 'AccS': 0.94170654296875, 'AccR': 0.9593218994140625, 'Outlier': 0.2265020751953125}

SF - Val:

python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round 1 --test_batch_size 2 --save
Original flow: {'EPE': 0.14206901295110583, 'AccS': 0.634364013671875, 'AccR': 0.76574951171875, 'Outlier': 0.34768310546875}
Weighted Kabsch flow: {'EPE': 0.1433044557645917, 'AccS': 0.6090081787109375, 'AccR': 0.7522802734375, 'Outlier': 0.3580340576171875}
Object-Aware ICP flow: {'EPE': 0.06135747742839157, 'AccS': 0.859599609375, 'AccR': 0.8907073974609375, 'Outlier': 0.1681353759765625}

Segmentation - Val:

python test_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round 1 --test_batch_size 2
Evaluation on kittisf-val:
AveragePrecision@50: 0.3040004303378626
PanopticQuality@50: 0.22565737308702502 F1-score@50: 0.3401574803149606 Prec@50: 0.2498554077501446 Recall@50: 0.5326757090012331
{'per_scan_iou_avg': 0.5131771497428417, 'per_scan_iou_std': 0.017402197439223527, 'per_scan_ri_avg': 0.5475729809701443, 'per_scan_ri_std': 0.004504442065954208}

So this looks all good, for Round 1 the segmentation result is even better than the one you have reported in the post above.

Round 2:

Here it get's a bit weird to me:

SF - Train:

python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round 2 --test_batch_size 2 --save
Original flow: {'EPE': 0.026448437687940897, 'AccS': 0.94170654296875, 'AccR': 0.9593218994140625, 'Outlier': 0.2265020751953125}
Weighted Kabsch flow: {'EPE': 0.09985332327196375, 'AccS': 0.7451837158203125, 'AccR': 0.81739501953125, 'Outlier': 0.334610595703125}
Object-Aware ICP flow: {'EPE': 0.08384263168089091, 'AccS': 0.8469573974609375, 'AccR': 0.898587646484375, 'Outlier': 0.2753076171875}

Is it expected that the original flow is much better than the Weighted Kabsch flow and Object-Aware ICP flow? I think this contradicts your statement: By the way, the "Original flow" in the outputs of OA-ICP should be exactly the same, otherwise the previous downsampling step is not correct. I'm not sure what I am doing wrong. It's probably best I delete all intermediate results and start again from the beginning.

SF - Val:

python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round 2 --test_batch_size 2 --save
Loaded weights from /mnt/ssd4/selfsupervised_OD/ogc/ckpt_out/seg/kittisf/kittisf_unsup_woinv_R2/best.pth.tar
Original flow: {'EPE': 0.06135747742839157, 'AccS': 0.859599609375, 'AccR': 0.8907073974609375, 'Outlier': 0.1681353759765625}
Weighted Kabsch flow: {'EPE': 0.10845257709734142, 'AccS': 0.7518304443359375, 'AccR': 0.8445501708984375, 'Outlier': 0.2459100341796875}
Object-Aware ICP flow: {'EPE': 0.08291638159193099, 'AccS': 0.87177490234375, 'AccR': 0.895157470703125, 'Outlier': 0.168624267578125}

Here the difference is not that big.

Segmentation Train:

Results after training using python train_seg.py config/seg/kittisf/kittisf_unsup.yaml --round 2:

python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split train --round 2 --test_batch_size 2 
AveragePrecision@50: 0.5843461014413663
PanopticQuality@50: 0.43964455737869956 F1-score@50: 0.5475504322766571 Prec@50: 0.4418604651162791 Recall@50: 0.7196969696969697
{'per_scan_iou_avg': 0.721262679696083, 'per_scan_iou_std': 0.020856219343986595, 'per_scan_ri_avg': 0.9482875975966454, 'per_scan_ri_std': 0.0019339963793754578}

That looks very good!

Segmentation Val:

python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round 2 --test_batch_size 2
AveragePrecision@50: 0.44320909604001196
PanopticQuality@50: 0.3760641999234322 F1-score@50: 0.47724974721941354 Prec@50: 0.40445586975149955 Recall@50: 0.5819975339087546
{'per_scan_iou_avg': 0.6221984243392944, 'per_scan_iou_std': 0.014200684800744056, 'per_scan_ri_avg': 0.92194748878479, 'per_scan_ri_std': 0.002544737458229065}

This is 10% less than you have reported in the paper, indicating I must have made a mistake somewhere. Thanks again for your help!

Just to be sure, I run the experiment pipeline like this. Am I missing something critical?

KITTI_SF="/mnt/ssd4/selfsupervised_OD/ogc/kitti_sf"

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
export CUDA_VISIBLE_DEVICES=1

python data_prepare/kittisf/process_kittisf.py ${KITTI_SF}
python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled
python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled --predflow_path flowstep3d

for ROUND in $(seq 1 2); do
    python train_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --round ${ROUND}
    python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round ${ROUND} --test_batch_size 2 --save
    python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round ${ROUND} --test_batch_size 2 --save
    python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split train --round ${ROUND} --test_batch_size 2
    python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round ${ROUND} --test_batch_size 2
done

# ROUND will remain as 2 here
python train_seg.py config/seg/kittisf/kittisf_unsup.yaml --round ${ROUND}
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split train --round ${ROUND} --test_batch_size 2
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round ${ROUND} --test_batch_size 2

python train_seg.py config/seg/kittisf/kittisf_unsup.yaml --round ${ROUND} would run twice with ROUND==2, that does not seem right - I should delete the last 3 commands in the script, right?

Szy-Young commented 2 years ago

Hi @baurst , thanks for your feedback!

1. Experiment pipeline

In your reported results, the SF-train and SF-val of Round 2 is not needed. In round1, we train the segmentation and improve the scene flow; In round 2, we only train the segmentation (with improved flow) and report it as final segmentation results. So your experiment pipeline is expected to be:

KITTI_SF="/mnt/ssd4/selfsupervised_OD/ogc/kitti_sf"

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
export CUDA_VISIBLE_DEVICES=1

python data_prepare/kittisf/process_kittisf.py ${KITTI_SF}
python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled
python data_prepare/kittisf/downsample_kittisf.py ${KITTI_SF} --save_root ${KITTI_SF}_downsampled --predflow_path flowstep3d           

# ROUND = 1 (No loop here! Only run ROUND=1; Use "kittisf_unsup_woinv.yaml" for training and testing)
python train_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --round ${ROUND}
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round ${ROUND} --test_batch_size 2 --save
python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round ${ROUND} --test_batch_size 2 --save
python test_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round ${ROUND} --test_batch_size 2
python test_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round ${ROUND} --test_batch_size 2

# ROUND = 2
python train_seg.py config/seg/kittisf/kittisf_unsup.yaml --round ${ROUND}
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split train --round ${ROUND} --test_batch_size 2
python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round ${ROUND} --test_batch_size 2

You are expected to get only two trained models: kittisf_unsup_woinv_R1 and kittisf_unsup_R2 in ckpt/seg/kittisf directory.

In your pipeline, what make me confusing is

for ROUND in $(seq 1 2); do
    python train_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml --round ${ROUND}
    python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split train --round ${ROUND} --test_batch_size 2 --save
    python oa_icp.py config/seg/kittisf/kittisf_unsup_woinv.yaml --split val --round ${ROUND} --test_batch_size 2 --save
    python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split train --round ${ROUND} --test_batch_size 2
    python test_seg.py config/seg/kittisf/kittisf_unsup.yaml --split val --round ${ROUND} --test_batch_size 2
done

The python test_seg.py config/seg/kittisf/kittisf_unsup.yaml will visit ckpt/seg/kittisf_unsup_R{ROUND}, which should not exist at this stage. Because you perform the training via python train_seg.py config/seg/kittisf/kittisf_unsup_woinv.yaml and can only got ckpt/seg/kittisf_unsup_woinv_R{ROUND}. So maybe intermediate results from previous runs are used? It might be a good idea to delete all current results and train from the beginning.

2. Testing in Round 2

If you have fixed the experiment pipeline, there might be another reason for the failure in reproduction. As you can see, in testing we load the model with lowest validation loss: https://github.com/vLAR-group/OGC/blob/3afbf55159a795b8e483602dceedb4315817da43/test_seg.py#L80-L83 However, this may not lead to the model with best performance. I occasionally met such case before, as shown below: loss_metric

A quick solution is to load the model from the final epoch: https://github.com/vLAR-group/OGC/blob/3afbf55159a795b8e483602dceedb4315817da43/test_seg.py#L76-L79 Or you can save model from different epochs during the training and select from them: https://github.com/vLAR-group/OGC/blob/3afbf55159a795b8e483602dceedb4315817da43/train_seg.py#L212-L217

Training log on Tensorboard can also help you debug~ I've reproduced with different random seeds and we can always got a ~50 F1 score and ~40 PQ score: multi_seed

baurst commented 2 years ago

Thank you very much for your help and detailed explanation! I will delete the intermediate results and try again. :)

I did not know that the experiment has tensorboard support! I never could find any tensorboad logs, so I assumed there is no tensorboard logging active. But I found out that the summaries are not written because the log_dir did not exist for me and thus no tensorboard files could be written. I created a PR #3 enabling the creation of the log_dir prior to running so that others can have the tensorboard as well.