minghanqin / LangSplat

Official implementation of the paper "LangSplat: 3D Language Gaussian Splatting" [CVPR2024 Highlight]
https://langsplat.github.io/
Other
549 stars 57 forks source link

Concerns about Dataset Usage and Discrepancies in Experimental Results #50

Open epsilontl opened 3 weeks ago

epsilontl commented 3 weeks ago

Dear Author,

I hope this message finds you well.

I have some concerns regarding the experimental setup and results presented in your paper, which I hope you can clarify.

1. Dataset Usage Issue: According to your code, the entire dataset, including the test set, is used for training. Isn't this setup problematic?

2.Discrepancies in Experimental Results: I re-trained the model by separating the training and test sets according to standard practices. The mIoU results I obtained are as follows:

These results significantly differ from those reported in the paper (ramen: 51.2, teatime: 65.1, waldo_kitchen: 44.5, figurines: 44.7).

3.Acknowledgment of Test Set Inclusion: Do you acknowledge that the test set was included in the training process? If so, do you believe the paper should be retracted under these circumstances?

I look forward to your response.

Thank you!

sangminkim-99 commented 3 days ago

Hi @epsilontl,

In the 3D Open-Vocabulary Semantic Segmentation (3D-OVS) task, it seems common to use all images as the training dataset. Therefore, they did not report how similar the rendered RGB is to the ground truth; instead, they focused on how well the trained model can segment the parts. For training, we do not impose any information about ground truth segmentation masks, so it is valid to use all images as the training set.

Do you think we still need to separate the training and testing datasets?

epsilontl commented 1 day ago

Hi @epsilontl,

In the 3D Open-Vocabulary Semantic Segmentation (3D-OVS) task, it seems common to use all images as the training dataset. Therefore, they did not report how similar the rendered RGB is to the ground truth; instead, they focused on how well the trained model can segment the parts. For training, we do not impose any information about ground truth segmentation masks, so it is valid to use all images as the training set.

Do you think we still need to separate the training and testing datasets?

Hi @sangminkim-99 ,

Thank you for your insightful response regarding the use of all images as the training dataset in the 3D Open-Vocabulary Semantic Segmentation (3D-OVS) task. I understand that you mentioned it is common practice to use the entire dataset for training in this context. However, I would like to clarify my understanding further.

When you mention "common," could you please elaborate on what sources or practices this is based on? From what I know, the standard approach in similar tasks involves separating the training and testing datasets to ensure the model's performance is evaluated accurately on unseen data. For instance, in the context of NeRF and 3DGS models used for OVS, the benchmark comparisons such as LangSplat versus FFD, 3D-OVS, and LERF, as well as the more recent LEGaussian presented at CVPR 2024, all adhere to this principle by separating the training and testing datasets and performing tests on novel viewpoints.

I would greatly appreciate any further insights you can provide on this matter.