Unfair Comparison - Githubissues

ouenal commented 2 years ago

In your paper (in Tab.2) I see that you compare to Jiang et al. (GPC). In GPC, the authors share that "for SemanticKITTI, considering that adjacent frames could have very similar contents", they try their best "to ensure that labeled and unlabeled data do not come from the same sequence." This implies that their labeled and unlabeled data split do not have the variety as your uniform sampling, thus a direct comparison is unfair.

ldkong1205 commented 2 years ago

Hi @ouenal, thank you for raising this question!

We noticed from Jiang et al. (GPC) that they didn't uniformly sample scans from the whole dataset. We answer this question from the following perspectives:

The sampling advantage only takes effect for lower proportions like 5% and 10% and will be washed out with higher proportions, where LaserMix still outperforms GPC by large margins.
We got in touch with the authors of GPC before. Unfortunately, they were not willing to open-source their code, nor send their code to us for reproducing. Therefore, we have to report their original results. We will add a detailed description of this in our revised paper for better clarity.
We believe that uniformly sample scans from the whole collected dataset would be a more practical option. As we discussed before in this issue, data collection is an easier process than annotation and we can collect huge numbers of raw scans from the LiDAR sensors. The problem now becomes how to select which scans to label with a fixed budget. Since adjacent scans share very similar contents, it's more reliable to label more diverse contents by uniformly sampling scans. That's why we follow this setting in our experiments.

Hope the above answers your concerns. Please let us know if you have any other questions!

ouenal commented 2 years ago

I would have to disagree with two of the statements that you've made.

Uniform sampling would always have an advantage over a split that considers adjacency in frames. Here is a simple example: In 50% labeled frames, you would be sampling every other frame, you would have a great representation of the entire dataset. From Jiang et al.'s description, my assumption would be that they fully label only a subset of sequences e.g. 0,1,2,3 and leave the rest unlabelled (the sequence indices are just an example and might not add up to 50% labeled frames). This means they have no information from the remaining sequences at all. This is a big difference.
The way LiDAR sequences are labeled directly contradicts with the second statement. LiDAR frames are labeled through aggregation on a global coordinate system. As most stuff (e.g. building, road) and even things (e.g. parked cars) are static in outdoor environments, aggregation allows us to label everything only once, saving a lot of time. This means labeling single frames uniformly across an entire dataset is not at all realistic, and I would argue would save no additional time. My suggestion would be to simply remove this table all together. I think the results on table 1 are sufficient to show the reader the effectiveness of the mixing strategy.

ldkong1205 commented 2 years ago

Hi @ouenal, thanks for the follow-ups!

For the first comment:

Yes, we do agree with you that uniform sampling has the advantage over sequential sampling. However, please note from Jiang et al. (GPC) that, they didn't mention they are selecting the first xx or the last xx scans of the whole dataset as labeled. They could have selected the first or the last xx scans from each sequence, where the diversity should not be a big difference compared with uniform sampling.
Jiang et al. (GPC) are not providing details like this for others to reproduce. Therefore, we cannot fairly compare them without such information.

For the second comment:

We do get what you mean and we believe your comment is reasonable from the perspective of LiDAR scan annotation. However, as we are working in semi-supervised learning, mostly from a learning perspective, where the conventional way of setting up proportions is to uniformly sample data from the whole dataset and assume they are labeled. Examples like this are the semi-supervised semantic segmentation with VOC and Cityscapes.
Your mentioned setting, i.e., sampling sequential scans, will bring extra domain gaps. For example, the different sequences might be collected from different locations, weather, or lighting conditions. This will bring extra complexity, as we have tested before that LiDAR scans under different scenarios with domain discrepancy are not fully transferable. Considering that this is one of the first works in semi-supervised LiDAR segmentation and from a semi-supervised learning perspective, we should not cope with things like this to entangle the semi-supervision with unsupervised domain adaptation.
Your suggestion makes sense. But Jiang et al. (GPC) is actually the closest work to ours in semi-supervised LiDAR segmentation. As suggested, I will discuss about keeping or removing Table 2 with my co-authors. Nonetheless, to make the comparison with GPC a more fair one, we will supplement experimental results that adopt the sequential sampling strategy. We will emphasize concerns like yours in our paper and update the results in our revised version.

Thanks again for the comment and suggestion. Please let us know if you have any other questions!

ouenal commented 2 years ago

Thanks for the back and forth. I'm sure that we will keep having some things to disagree on but it's a valuable discussion to have nonetheless. Data efficiency in LiDAR segmentation is still a fairly new topic and we have quite a lot to reseach and improve here as a community. Keep up the good work!

ldkong1205 commented 2 years ago

Hi @ouenal, thank you so much for sharing your thoughts and experience with us! Your comments have enlightened us to consider more practical scenarios when conducting experiments.

Yep, data-efficient LiDAR perception is the blue ocean, and let's keep up exploring more!

ldkong1205 commented 2 years ago

Hi @ouenal, long time no see! Here are some follow-ups for this issue:

Recently, we got in touch with the authors of GPC again and they shared with us the split that they used in their experiments.
We will run experiments on their split and report the results in our Repo.
We will let you know once the results are ready.

yyliu01 commented 1 year ago

Hi @ldkong1205, Is there any news about this different set-up?

ldkong1205 commented 1 year ago

Hi @ldkong1205, Is there any news about this different set-up?

Hi @yyliu01, thanks for your interest in this work!

We recently updated this codebase to the MMDetection3D format. Currently, we have released the config for random data splits on SemanticKITTI; we will include config files for nuScenes and Waymo Open soon.
For the sequential setup, we have generated the data split files and we are about to run experiments under such a setting.
Since this can be regarded as a brand new benchmark, we will need to run everything again. We would expect a release in the following weeks.
We will keep you updated and let you know once the code and results are available (which should be soon).

yyliu01 commented 1 year ago

Hi @ldkong1205,

Thanks so much for the solid work. We will follow up once the results have been released.

Best Regards, Yuyuan

ldkong1205 / LaserMix

Unfair Comparison #2