Cannot reproduce paper results

pqhieu / jsis3d

[CVPR'19] JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds

https://pqhieu.com/research/cvpr19/

MIT License

176 stars 36 forks source link

Cannot reproduce paper results #3

Closed rig8f closed 5 years ago

rig8f commented 5 years ago

I've gone through the paper and run the code for MT-PNet with the commands specified in the Readme file and the same configuration present in s3dis.json but I cannot obtain the same results evidenced in the paper.

After 100 epochs of training, evaluation gives me

"mean_accuracy": 0.4461
"mean_IoU": 0.3441
"mean_precision": 0.2504
"mean_recall": 0.2118

and going on to 200 epochs results in

"mean_accuracy": 0.4617
"mean_IoU": 0.3600
"mean_precision": 0.2795
"mean_recall": 0.2330

Am I doing something wrong? Do I need to change some parts or adjust parameters? Let me know if you need more information. Thank you!

pqhieu commented 5 years ago

Hi @rig8f,

Sorry for the late reply. Could you help me attach the train.log and eval.json files in your log folder?

Thank you.

rig8f commented 5 years ago

Here are the requested logs (I appended .log to eval.json files due to GitHub file restrictions) for both tests. Let me know if you need more files, information or have suggestions.

Thank you.

train100e.log eval100e.json.log train200e.log eval200e.json.log

pqhieu commented 5 years ago

Did you change any parameters in the configuration (batch size, learning rate, etc)? Your per-class accuracy is quite low compared to the usual numbers that I get after 100 epochs.

I just retrained another model from scratch. Attached here is the log folder for reference.

You can try saving additional checkpoints instead of only the best model. In my experience, the best results are often achieved after ~40 epochs.

Hope this helps.

rig8f commented 5 years ago

Yes, it definitely helped me understanding the problem. The numbers I mentioned in the first comment come from a basic modification of the final part of eval.py where I also save the mean value of all the metrics (simply using np.mean(accu) for example, as it is also displayed to stdout).

But if we look at the per-class accuracy values obtained in a training from scratch using the same parameters as in your s3dis.json (eval.json file) we can see they are comparable: the fact is some classes are better detected than others, and the latter lower the mean value, as I should have figured out way before opening this issue.

Anyway, I see now that you have updated the scripts to include the overall accuracy metric and several other interesting things. Thank you!

ZhengdiYu commented 3 years ago

I've gone through the paper and run the code for MT-PNet with the commands specified in the Readme file and the same configuration present in s3dis.json but I cannot obtain the same results evidenced in the paper.

After 100 epochs of training, evaluation gives me
"mean_accuracy": 0.4461
"mean_IoU": 0.3441
"mean_precision": 0.2504
"mean_recall": 0.2118
and going on to 200 epochs results in
"mean_accuracy": 0.4617
"mean_IoU": 0.3600
"mean_precision": 0.2795
"mean_recall": 0.2330
Am I doing something wrong? Do I need to change some parts or adjust parameters? Let me know if you need more information. Thank you!

@rig8f May I ask where is the inconsistency? Did I misunderstand it?

As I see in the paper the results on S3DIS is: mAP Ours (MT-PNet) 24.9 71.5 78.4 28.3 24.4 3.5 12.1 36.2 10 12.6 34.5 12.8. And the precision "mean_precision" you get is quite similar.

rig8f commented 3 years ago

@ZhengdiYu No, it was my fault, as I explained in this previous comment.

ZhengdiYu commented 3 years ago

@ZhengdiYu No, it was my fault, as I explained in this previous comment.

Thank you for your quick reply. By the way, but I don't really understand the results. hope you can help me as well.

So the "mean_precision" stands for mAP in the paper ? But in the eval.json which the author gave to you, it said:
```
"mean_recall": 0.26129484999175207,
"mean_precision": 0.3318457691464621,
"mean_IoU": 0.4134808521564693,
"mean_accuracy": 0.49937982103730627
```
the mean precision here is 33%, which is higher than the reported result in the paper(24.9) . So I guess maybe it's not mAP?
There are 11 class reported in the paper, but there are 13 class in the results log file?
Dose the "overall accuracy" in eval.py stands for the "mAcc" for semantic segmentation in the paper?