meetps / pytorch-semseg

Semantic Segmentation Architectures Implemented in PyTorch
https://meetshah.dev/semantic-segmentation/deep-learning/pytorch/visdom/2017/06/01/semantic-segmentation-over-the-years.html
MIT License
3.39k stars 796 forks source link

Pascal VOC 2012 #46

Closed mrgloom closed 6 years ago

mrgloom commented 6 years ago

Have you balanced\weighted the classes for Pascal VOC 2012? https://discuss.pytorch.org/t/multilabel-classification-under-unbalanced-class-distributions/2950

I can't see any specific code here: https://github.com/meetshah1995/pytorch-semseg/blob/master/ptsemseg/loader/pascal_voc_loader.py

meetps commented 6 years ago

Nope, I haven't. Pascal VOC is a fairly balanced dataset, so I didn't feel like doing it. Maybe needed in Cityscapes or Camvid.

mrgloom commented 6 years ago

I count pixels for class for pascal voc 2012 and background is most frequent class about 70% of all area, so I consider it unbalanced:

('aeroplane', ':', 1780580)
('bicycle', ':', 758311)
('bird', ':', 2232247)
('boat', ':', 1514260)
('bottle', ':', 1517186)
('bus', ':', 4375622)
('car', ':', 3494749)
('cat', ':', 6752515)
('chair', ':', 2861091)
('cow', ':', 2060925)
('diningtable', ':', 3381632)
('dog', ':', 4344951)
('horse', ':', 2283739)
('motorbike', ':', 2888641)
('person', ':', 11995853)
('potted_plant', ':', 1670340)
('sheep', ':', 2254463)
('sofa', ':', 3612229)
('train', ':', 3984238)
('tv/monitor', ':', 2349235)
('background', ':', 182014429)

I don't really understand why class weighting not needed: For exanple in FCN paper they wrote: https://arxiv.org/pdf/1605.06211.pdf

Class balancing
Fully  convolutional  training  can  balance  classes  by  weighting  or  sampling  the  loss.  Although our  labels  are  mildly  unbalanced  (about 3/4 are  back-ground), we find class balancing unnecessary.

But for example in Segnet paper they use weights: https://arxiv.org/pdf/1511.00561.pdf

We use the cross-entropy loss [2] as the objective function for
training  the  network.  The  loss  is  summed  up  over  all  the  pixels
in  a  mini-batch.  When  there  is  large  variation  in  the  number  of
pixels in each class in the training set (e.g road, sky and building
pixels dominate the CamVid dataset) then there is a need to weight
the loss differently based on the true class. This is termed class balancing. 
We  use median  frequency  balancing [13]  where  the
weight assigned to a class in the loss function is the ratio of the
median  of  class  frequencies  computed  on  the  entire  training  set
divided by the class frequency. This implies that larger classes in
the  training  set  have  a  weight  smaller  than 1 and  the  weights
of  the  smallest  classes  are  the  highest.  We  also  experimented
with  training  the  different  variants  without  class  balancing  or
equivalently using natural frequency balancing.
meetps commented 6 years ago

If you have a look at the loss function, it ignores the background class (index 0) altogether.

Yes, I'm aware of the median-frequency balancing used by Segnet. But most recent papers (PSPNet, FRRN and RefineNet) do not use class balancing and hence I'm not too keen on balancing the classes.