Question about the usage of CUB dataset

nv-tlabs / DIB-R

Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer (NeurIPS 2019)

https://nv-tlabs.github.io/DIB-R/

MIT License

657 stars 110 forks source link

Question about the usage of CUB dataset #2

Closed hideonInternet closed 4 years ago

hideonInternet commented 5 years ago

Hi, nice work! Do you use the CUB-200-2011 dataset provided by CMR? Because it is different from the CUB dataset provided by the official. I found that the results presented in your paper are different from the result I evaluated on the data provided by CMR. btw, I used the model trained 500 epochs. And the test set contains 2874 images. If not, could you provide the dataset you used to train and evaluate your model?

hideonInternet commented 5 years ago

500 epoch: using GT cam: IoU 0.735, PCK.1 0.921 PCK.15 0.982 using pred cam: IoU 0.703 PCK.1 0.812 PCK.15 0.930

wenzhengchen commented 5 years ago

HI, we train CMR following the instruction of https://github.com/akanazawa/cmr/blob/master/doc/train.md on the official CUB 200-2011 dataset.

TO be specific, we download CUB dataset from http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz and it contains 2874 images.

hideonInternet commented 5 years ago

So instead of using the CMR model provided by https://github.com/akanazawa/cmr, you train another CMR model using the original differentiable renderer on the official CUB-200-2011 dataset.

How you preprocess the gt segmentations? Because the pixel value of gt segmentations provided by the official is between 0~255. I saw the pixel value of gt segmentations shown in your paper is either 0 or 255. The preprocess method you used may be different from CMR, causing different results. Could you please provide the gt segmentations you used to train your CMR model?

wenzhengchen commented 5 years ago

Hi,

In training, we didn't do any process to the gt segmentations. We use the code from CMR to handle the GT. They should be exactly the same as CMR.

For the image in paper, are you saying the segmentation in the supp? If so, they are binary for visualization.

hideonInternet commented 5 years ago

The left image is the gt segmentation used in CMR for training, and the right one is the corresponding gt segmentation provided by the official. you used the right segmentations to train your own "CMR" model, while the original CMR model is trained on the left segmentations. This leads to a different result. Am I right?

wenzhengchen commented 4 years ago

Hi,

No, I would say, for the GT we do exactly the same as CMR paper and use the left image to train.

As stated in https://github.com/akanazawa/cmr/blob/master/doc/train.md (See below image).

First, we download the "original cub dataset". Next, we run the script from CMR and everything (image, segmentation) would be processed. We didn't do any data modification. In a word, our training data should be the same as CMR.

As for the different scores, you said that it is

using GT cam: IoU 0.735, PCK.1 0.921 PCK.15 0.982

Our report is using GT cam: IoU 0.738, PCK.1 0.930.

I think the scores are similar and the difference is because of training noise.

hideonInternet commented 4 years ago

I got it, thank you so much for your patience!