Closed Ritchizh closed 2 years ago
Thanks for pointing that out, I will make changes to the script which will directly use the output from HAIS with the confidence score.
Thank you, I will be looking forward to the update :)
Maybe it would be convenient to modify mergeResult.py - so that it reads and keeps confidence scores?
The script is updated, I will later provide instructions on how to use it on the main page, basically, it just uses the direct output from HAIS under the result folder but does not merge the blocks at all...
OK, I think the issue is resolved :)
https://github.com/meidachen/STPLS3D/blob/9d8b2d171ef541eea6d10a95cc7c4ce230fac21a/HAIS/STPLS3DInstanceSegmentationChallenge_Codalab_Evaluate.py#L436 -- Not very clear, where one should get ground truth for Test files in numpy format. It can be done here: https://github.com/meidachen/STPLS3D/blob/9d8b2d171ef541eea6d10a95cc7c4ce230fac21a/HAIS/data/prepare_data_inst_instance_stpls3d.py#L128-L130 but it's commented by default and also it's named validation, not test. I switched it back to .txt, since it's the default dataset format:
gt = os.path.join(data_path, img_id + '.txt')
data = pd.read_csv(gt, header = None).values
Hi! In your evaluation script you fill confidence with dummy 1 values: https://github.com/meidachen/STPLS3D/blob/65917491c6a507b97c1d1ed60dcffd418524e3d8/HAIS/STPLS3DInstanceSegmentationChallenge_Codalab_Evaluate.py#L527-L528
However, when plotting precision/recall curve, unique confidence values are used for points sorting and deciding on number of points where to sample precision-recall values: https://github.com/meidachen/STPLS3D/blob/65917491c6a507b97c1d1ed60dcffd418524e3d8/HAIS/STPLS3DInstanceSegmentationChallenge_Codalab_Evaluate.py#L161-L162 This results in the number of unique recall values being always 1. Whatever the number of instances with a given semantic label, your precision-recall curve always has 2 points (one is artificial). Is this the correct way to calculate area under precision-recall curve?
Moreover, when searching for ground truth - prediction matches, the first match with overlap>threshold is considered true positive, while all the later matches are automatically marked as false positives: https://github.com/meidachen/STPLS3D/blob/65917491c6a507b97c1d1ed60dcffd418524e3d8/HAIS/STPLS3DInstanceSegmentationChallenge_Codalab_Evaluate.py#L98-L101 -- this assumes predictions were sorted by confidence, otherwise not-optimal prediction with not the highest confidence would become true positive. Do you think this can happen?