pmeletis / panoptic_parts

This repository contains code and tools for reading, processing, evaluating on, and visualizing Panoptic Parts datasets. Moreover, it contains code for reproducing our CVPR 2021 paper results.
https://panoptic-parts.readthedocs.io/en/stable
Apache License 2.0
100 stars 16 forks source link

Pascal Context Baseline #18

Closed lxtGH closed 3 years ago

lxtGH commented 3 years ago

Hi! I have several questions on your pascal context baseline. @DdeGeus @pmeletis @vincentwen1995

1, Did you use all 79 stuff labels during training? I saw your CVPR paper only used 59 labels(39 stuff and 20 things).

2, Based on Question1, in your evaluation settings, (ppq_ppp_59_57_cvpr21_default_evalspec.yaml) It seems that several labels are not evulated at all. Only 57 part labels or 39 stuff labels are evaluated. Why? How about the remaninig classes results?

3, Since panoptic part labels already have full annotations of things, stuff and parts. Why use Pascal Context DataSet(59) labels for training ?

4, Dose the remaining classes have conflicts with Pascal Context dataset? If not , we can use the Panoptic Part Labels directly. (39 stuff classes).

lxtGH commented 3 years ago

Aslo, would you release the panoptic.json and *.png for reference and PQ evaluation ?

DdeGeus commented 3 years ago

Hi @lxtGH, thanks for your interest in our work!

To establish the state-of-the-art baselines or our paper, we tried to represent the state of the current work as best as possible. Therefore, we aimed to use existing trained models as much as possible, for the subtasks of semantic segmentation, instance segmentation and part segmentation. This also allows for easier comparison with earlier work.

The existing models on the Pascal VOC dataset were all trained with specific label definitions, following the respective conventions. For semantic segmentation, this means training on the indicated 59 commonly used classes from Pascal-Context. For part segmentation, this meant following the label definition defined in the paper presenting the BSANet network.

So, to answer your questions specifically:

  1. For semantic segmentation, we used the 59 classes that are commonly used when training on Pascal-Context.

  2. For PartPQ evaluation of the baselines presented in the paper, we only evaluate on the classes for which we have predictions. This means the 59 scene-level classes (20 things, 39 stuff), and 57 part classes. The results for the remaining classes would be 0, as there are no predictions for these classes.

  3. To best represent the state-of-the-art, we use existing trained models. These models were trained on the Pascal-Context dataset, but these trainings were not carried out by us.

  4. The classes of Pascal-Context are a subset of the classes in Pascal-Panoptic-Part, so in that respect there are no conflicts. However, we found that the Pascal-Context that is commonly used in literature (https://sites.google.com/view/pasd/dataset), has several flaws and missed objects with respect to the original Pascal-Context (https://cs.stanford.edu/~roozbeh/pascal-context/), which we used to generate Pascal-Panoptic-Part. That is why, for instance, the performance (on mIOU) of DeepLab-ResNeSt269 reported in their repo is 58.9, but it is 55.1 when evaluated on our dataset.

Therefore, we would recommend using our dataset for training and evaluation, instead of the commonly used Pascal-Context. We also hope that people will use all the classes available in our dataset. We are already working on this ourselves too.

For a fully fair comparison with the baselines in our CVPR paper, it would be ideal to train on the respective data we refer in the experiments. However, for new works we suggest that the conflict-free PPP data is used.

lxtGH commented 3 years ago

Hi! Thanks for your reply. @DdeGeus According to my experience on segmentation, using 193 part classes may result in worse results compared with using the subset for training(57 classes) when both settings use 57 parts for evaluation. This is mainly the long-tailed distribution of data. So will your paper report the results for using 193 part classes for reference ?

DdeGeus commented 3 years ago

Hi @lxtGH, indeed, training on all 193 part classes will probably result in worse performance. So you're correct, it would definitely not be fair to compare results of evaluation on 57 classes when one network is trained on those 57 classes and the other on all 193 classes.

Therefore, for a fair comparison of methods, it's probably best to train on the same classes that we used in the CVPR paper.

We do also plan to train models & establish baselines for all 193 classes, but currently this is not a priority. So, unfortunately, we cannot promise to report these results soon.

lxtGH commented 3 years ago

I got it. Thank you very much !!