meidachen / STPLS3D

🔥 Synthetic and real-world 2d/3d dataset for semantic and instance segmentation (BMVC 2022 Oral)
236 stars 20 forks source link

Which area is used as the testing area for STPLS3D semantic segmentation? #12

Open whuhxb opened 2 years ago

whuhxb commented 2 years ago

Hi @meidachen

In the codes of RandLA-Net, SCF-Net, KPConv, following the table3 and table6 from the paper, I want to know clear which area of the RealWorldData is used as the testing area? WMSC_split? Or which one? I want to have a fair comparison.

The STPLS3D dataset is only splited into training and validation, right? Not training, validation, and testing.

In addition, following the code of data_preparation_STPLS3D.py in RandLA-Net and SCF-Net, even using .txt files, the data_preparation_STPLS3D.py points to using RealWorldData to train and test, instead of using Synthetic dataset. If to use both RealWorldData and Synthetic data to train and use a area of RealWorldData to test, the dataset preparation may follow the data_preparation_STPLS3D.py in KPConv.

Thanks.

meidachen commented 2 years ago

WMSC is the validation data for the experiments reported in the paper tables 3 and 6. To replicate what we have achieved, I would suggest starting with KpConv (since we have provided the simple instruction to run it) and making sure everything works fine there, then moving on to RandLA and SCF-net.

whuhxb commented 2 years ago

Hi @meidachen

OK. Thanks. I will run KPConv at first, and then move to RandLA-Net and SCF-Net. In addition, I'm not clear that STPLS3D dataset is only splited into training and validation, right? Not training, validation, and testing. During training, the WMSC is used for evaluate to obtain the best model, and then WMSC used as to test the model?

meidachen commented 2 years ago

That is correct, in the paper we have tested the trained model in another dataset (FDc) which cannot be released. So on the released dataset, you can either validate and test on WMSC, or you could do cross-validation using all four real-world datasets.

whuhxb commented 2 years ago

Hi @meidachen

So on the released dataset, you can either validate and test on WMSC. This operation is similar to the S3DIS with 5 area as testing. But I think it is not reasonable as using the 5 area as validation and testing at the same time. Actually, validation and testing should have no intersection. Or, if not using WMSC as validation, only as testing, 3 or 5 times average is also OK.

Thanks.

meidachen commented 2 years ago

You are right, it is better to have validation and testing sets without intersection, and yes, I was following S3DIS (testing on aera5) when releasing STPLS3D. One of the main reasons that we can't really do a train, validation, and test split is the lack of real-world data. Two ways of doing it would be: 1) use say OCCC or RA as the validation, then we lost one scene to train the model. 2) use part of say USC or OCCC or RA as validation, then the model could be overfitted since the validation and training sets do have an intersection in the sense that they came from the same area and may share similar properties.

meidachen commented 2 years ago

Or, if not using WMSC as validation, only as testing, 3 or 5 times average is also OK.

In this case, I think cross-validation would be a better option.

volare1996 commented 2 years ago

An error occurs when the two sets of data have inconsistent category labels during model fine-tuning using RandLA. Can you post the fine-tuning code? Thanks!

meidachen commented 2 years ago

Hi @volare1996 ,

Which two sets of data are you using?