ldkong1205 / LaserMix

[CVPR 2023 Highlight] LaserMix for Semi-Supervised LiDAR Semantic Segmentation
https://ldkong.com/LaserMix
Apache License 2.0
294 stars 18 forks source link

Unfair Comparison #2

Closed ouenal closed 2 years ago

ouenal commented 2 years ago

In your paper (in Tab.2) I see that you compare to Jiang et al. (GPC). In GPC, the authors share that "for SemanticKITTI, considering that adjacent frames could have very similar contents", they try their best "to ensure that labeled and unlabeled data do not come from the same sequence." This implies that their labeled and unlabeled data split do not have the variety as your uniform sampling, thus a direct comparison is unfair.

ldkong1205 commented 2 years ago

Hi @ouenal, thank you for raising this question!

We noticed from Jiang et al. (GPC) that they didn't uniformly sample scans from the whole dataset. We answer this question from the following perspectives:

Hope the above answers your concerns. Please let us know if you have any other questions!

ouenal commented 2 years ago

I would have to disagree with two of the statements that you've made.

  1. Uniform sampling would always have an advantage over a split that considers adjacency in frames. Here is a simple example: In 50% labeled frames, you would be sampling every other frame, you would have a great representation of the entire dataset. From Jiang et al.'s description, my assumption would be that they fully label only a subset of sequences e.g. 0,1,2,3 and leave the rest unlabelled (the sequence indices are just an example and might not add up to 50% labeled frames). This means they have no information from the remaining sequences at all. This is a big difference.
  2. The way LiDAR sequences are labeled directly contradicts with the second statement. LiDAR frames are labeled through aggregation on a global coordinate system. As most stuff (e.g. building, road) and even things (e.g. parked cars) are static in outdoor environments, aggregation allows us to label everything only once, saving a lot of time. This means labeling single frames uniformly across an entire dataset is not at all realistic, and I would argue would save no additional time. My suggestion would be to simply remove this table all together. I think the results on table 1 are sufficient to show the reader the effectiveness of the mixing strategy.
ldkong1205 commented 2 years ago

Hi @ouenal, thanks for the follow-ups!

For the first comment:

For the second comment:

Thanks again for the comment and suggestion. Please let us know if you have any other questions!

ouenal commented 2 years ago

Thanks for the back and forth. I'm sure that we will keep having some things to disagree on but it's a valuable discussion to have nonetheless. Data efficiency in LiDAR segmentation is still a fairly new topic and we have quite a lot to reseach and improve here as a community. Keep up the good work!

ldkong1205 commented 2 years ago

Hi @ouenal, thank you so much for sharing your thoughts and experience with us! Your comments have enlightened us to consider more practical scenarios when conducting experiments.

Yep, data-efficient LiDAR perception is the blue ocean, and let's keep up exploring more!

ldkong1205 commented 2 years ago

Hi @ouenal, long time no see! Here are some follow-ups for this issue:

yyliu01 commented 1 year ago

Hi @ldkong1205, Is there any news about this different set-up?

ldkong1205 commented 1 year ago

Hi @ldkong1205, Is there any news about this different set-up?

Hi @yyliu01, thanks for your interest in this work!

yyliu01 commented 1 year ago

Hi @ldkong1205,

Thanks so much for the solid work. We will follow up once the results have been released.

Best Regards, Yuyuan