open-mmlab / mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.
https://mmsegmentation.readthedocs.io/en/main/
Apache License 2.0
8.23k stars 2.61k forks source link

Binary Segmentation with Segmenter #2508

Closed imemmul closed 1 year ago

imemmul commented 1 year ago

Hello everyone, I am using Segmenter for my custom dataset. My dataset consists of 10k eddy velocities in matlab files and with binary ground truth png images. I loaded these files and converted them into 3x256x256 shape and give them to the Segmenter. I do not have any problems with dataset because I tried them in U-Net it works but while using Segmenter I got no results. Even though, loss is decreasing dramatically I can see that instead of learning, model is setting itself to the label of class needed to be learned (as a prediction i am getting white blank image with loss of "decode.loss_ce": 0.00986") but acc_seg is around 4%. Below you can see my an example from my dataset and my config. In addition to that I am using ignore_index = 0, so I have 0 for background and 1 for eddies, is it okay to use ignore_index = 0 ? Does it matter if it's 0 or 255 ? So below you can see that how it looks like when i loaded to matlab file and normalized them (because they have negative values and i think the model should not have a problem with negatives) Screenshot 2023-01-23 at 18 01 03 and its ground truth Screenshot 2023-01-23 at 18 02 36 This is Segmenter's config. I tried num_classes = 1, num_classes = 2, got no results. Screenshot 2023-01-23 at 18 03 22 other parameters are default.

Thank you for your help!!

MengzhangLI commented 1 year ago

Hi, for binary segmentation task, this doc may help you.

imemmul commented 1 year ago

I actually used this documentation when setting up the configs. I tried to use cross entropy with sigmoid function with num_classes = 2 out_channels = 1. What's wrong ?

MengzhangLI commented 1 year ago

OK, so what happened if num_classes=2, out_channels=2 and use_sigmoid=False in CrossEntropyLoss?

In my opinion, the phenomenon loss is decreasing dramatically I can see that instead of learning is caused by imbalanced ratio of foreground and background.

lucas-sancere commented 1 year ago

Hi, I face the same issue with Segmenter binary segmentation trained on custom binary dataset, but I do not have the same Segmenter config so far.

After running inference I have image filled with pixel of value 1, 1 being my foreground class. Even using the validation set as inference I get the same.

My custom dataset is quite balanced, around 30 % of foreground if I sum all images. My annotations image are not RGB but grey-scale image, I followed the corresponding comment on custom dataset documentation:

The annotations are images of shape (H, W), the value pixel should fall in range [0, num_classes - 1]. You may use 'P' mode of [pillow](https://pillow.readthedocs.io/en/stable/handbook/concepts.html#palette) to create your annotation image with color.

During the training my loss is also decreasing dramatically (see my previously raised issue)

I will try a new training after changing my config accordingly to the last comment of this post.

MeowZheng commented 1 year ago

I think it is improper to use ignore_index = 0, as this option is for the pixels we don't care about their prediction and the loss function will not calculate the loss at these pixels. However, you need to identify whether the pixels are 0 background or 1 eddies.

In mmseg, the default value of ignore_index is 255 which is far from the category indices of the common dataset annotation, and I think the 0 index in your dataset is a category index that you cannot ignore, as above mention that you need to identify whether the pixels are 0 background or 1 eddies.

lucas-sancere commented 1 year ago

Hi thank you for your help,

In my case I didn't use ignore_index = 0 but my main error was to set reduce_zero_label=True for the binary segmentation task (I didn't change it from my copy of configs/segmenter/segmenter_vit-l_mask_8x1_640x640_160k_ade20k.py that I used to create my custom config).

Now using this config file, (following MengzhangLI last comment and correcting reduce_zero_label):

_base_ = [
    '../_base_/models/segmenter_vit-b16_mask.py',
    '../_base_/datasets/BBLdataset.py', '../_base_/default_runtime.py',
    '../_base_/schedules/schedule_BBL_160k.py'
]
checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segmenter/vit_large_p16_384_20220308-d4efb41d.pth'  # noqa

model = dict(
    pretrained=checkpoint,
    backbone=dict(
        type='VisionTransformer',
        img_size=(640, 640),
        embed_dims=1024,
        num_layers=24,
        num_heads=16),
    decode_head=dict(
        type='SegmenterMaskTransformerHead',
        in_channels=1024,
        channels=1024,
        num_classes=2,
        out_channels=2,
        num_heads=16,
        dropout_ratio=0.0,
        embed_dims=1024,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0, avg_non_ignore=True)),
    test_cfg=dict(mode='slide', crop_size=(640, 640), stride=(608, 608)))

optimizer = dict(lr=0.001, weight_decay=0.0)

img_norm_cfg = dict( # This img_norm_cfg is widely used because it is mean and std of ImageNet 1K pretrained model
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

crop_size = (640, 640)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=False),
    dict(type='Resize', img_scale=(2560, 640), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]
val_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2560, 640),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2560, 640),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ]), 
]

data = dict(
    # num_gpus: 8 -> batch_size: 8
    samples_per_gpu=1,
    train=dict(pipeline=train_pipeline),
    val=dict(pipeline=val_pipeline),
    test=dict(pipeline=test_pipeline))

And running inference on the validation dataset (I don't have test dataset yet) from the checkpoint obtained during training, I end up with a binary segmentation as output!

The only thing is my predicted background class is now set with pixel value 1 (instead of 0 in training dataset) and my predicted foreground class is set with pixel value 2 (instead of 1 in training dataset). It is not a problem, I am just wondering what is the reason.

Maybe the Binary Segmentation + reduce_zero_label parts of the FAQ could be included in the main documentation?

I will run inference on a test dataset once I have it, I will comment here if there is any resulting issue but there is no reason.

Thank you for your help

MeowZheng commented 1 year ago

Hi @lucas-sancere, my reply is for the issue. Actually, I didn't know what your config is, when I reply.

Moreover, whether set reduce_zero_label=True depends on the dataset and the task. If dataset has 3 category (0, 1, 2) and the task is binary segmentation and the pixels with label 0 should be ignored, just set reduce_zero_label=True, and the pixel with label 0 will be ignored, we just identify the pixels with label 1 and 2.

MengzhangLI commented 1 year ago

Hi thank you for your help,

In my case I didn't use ignore_index = 0 but my main error was to set reduce_zero_label=True for the binary segmentation task (I didn't change it from my copy of configs/segmenter/segmenter_vit-l_mask_8x1_640x640_160k_ade20k.py that I used to create my custom config).

Now using this config file, (following MengzhangLI last comment and correcting reduce_zero_label):

_base_ = [
    '../_base_/models/segmenter_vit-b16_mask.py',
    '../_base_/datasets/BBLdataset.py', '../_base_/default_runtime.py',
    '../_base_/schedules/schedule_BBL_160k.py'
]
checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segmenter/vit_large_p16_384_20220308-d4efb41d.pth'  # noqa

model = dict(
    pretrained=checkpoint,
    backbone=dict(
        type='VisionTransformer',
        img_size=(640, 640),
        embed_dims=1024,
        num_layers=24,
        num_heads=16),
    decode_head=dict(
        type='SegmenterMaskTransformerHead',
        in_channels=1024,
        channels=1024,
        num_classes=2,
        out_channels=2,
        num_heads=16,
        dropout_ratio=0.0,
        embed_dims=1024,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0, avg_non_ignore=True)),
    test_cfg=dict(mode='slide', crop_size=(640, 640), stride=(608, 608)))

optimizer = dict(lr=0.001, weight_decay=0.0)

img_norm_cfg = dict( # This img_norm_cfg is widely used because it is mean and std of ImageNet 1K pretrained model
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

crop_size = (640, 640)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=False),
    dict(type='Resize', img_scale=(2560, 640), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]
val_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2560, 640),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2560, 640),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ]), 
]

data = dict(
    # num_gpus: 8 -> batch_size: 8
    samples_per_gpu=1,
    train=dict(pipeline=train_pipeline),
    val=dict(pipeline=val_pipeline),
    test=dict(pipeline=test_pipeline))

And running inference on the validation dataset (I don't have test dataset yet) from the checkpoint obtained during training, I end up with a binary segmentation as output!

The only thing is my predicted background class is now set with pixel value 1 (instead of 0 in training dataset) and my predicted foreground class is set with pixel value 2 (instead of 1 in training dataset). It is not a problem, I am just wondering what is the reason.

Maybe the Binary Segmentation + reduce_zero_label parts of the FAQ could be included in the main documentation?

I will run inference on a test dataset once I have it, I will comment here if there is any resulting issue but there is no reason.

Thank you for your help

Hi, Lucas, how many classes in your BBLdataset? For example, in binary segmentation dataset DRIVE, it has two types background and vessel.

It is wired your predicted foreground class is set with pixel value 2 because if you only have two categories, only value 0 and 1 are defined. So can you show more details about your BBLdataset?

Best,

lucas-sancere commented 1 year ago

Hi @lucas-sancere, my reply is for the issue. Actually, I didn't know what your config is, when I reply.

Moreover, whether set reduce_zero_label=True depends on the dataset and the task. If dataset has 3 category (0, 1, 2) and the task is binary segmentation and the pixels with label 0 should be ignored, just set reduce_zero_label=True, and the pixel with label 0 will be ignored, we just identify the pixels with label 1 and 2.

Hi @MeowZheng, I knew your response was about the issue itself and not my comment (it is why I precised "In my case"). However I think it is relevant to comment here even if the config file is slightly different because I encountered the exact same issue as @imemmul .

Thank you for adding explanations about reduce_zero_label=True! In my case I had only 0 and 1 as values so I was discarding the 0s whereas it is not correct.

lucas-sancere commented 1 year ago

Hi @MengzhangLI ,

My dataset has 2 categories, tumor regions and normal tissue. normal tissue class is my background class, filled with pixel of value 0, whereas my tumor regions class is my foreground class, filled with pixel of value 1.

It is the exact same format as DRIVE in this regard. My input data are RGB images in TIFF format and my annotation data are PNG files one channel grey-level images filled only with 0s and 1s.

aymanaboghonim commented 1 year ago

@lucas-sancere Did you solve your issue ? I am facing same issue. My dataset has 2 classes ('background' , 'building') , img is rgb.jpg while labels is .png where 0 is background and 128 is object.

lucas-sancere commented 1 year ago

@lucas-sancere Did you solve your issue ? I am facing same issue. My dataset has 2 classes ('background' , 'building') , img is rgb.jpg while labels is .png where 0 is background and 128 is object.

Hi @aymanaboghonim I am not sure you have the same issue as me. The dataset you are talking about is your training dataset?

Following the documentation:

The annotations are images of shape (H, W), the value pixel should fall in range [0, num_classes - 1]

Your training dataset should contains pixel of value 1 for your classbuilding, not 128.

If you are talking about the pixel values for the inference output, no I didn't solve the issue I still have different pixel values for the prediction (1 for background and 2 for foreground) than in my training set (0 for background and 1 for foreground), but of course it is not a real problem.

aymanaboghonim commented 1 year ago

@lucas-sancere Thanks for you reply. sorry, labels is in range [0, 1] not [0, 128]. could you please help me to solve my issue. here is my class to register custom dataset.

`classes = ('background', 'building')

palette = [[225, 228, 128], [50, 50, 50]]
@DATASETS.register_module()
class BuildingSegmentation(CustomDataset):
  CLASSES = classes
  PALETTE = palette
  def __init__(self, split, **kwargs):
    super().__init__(img_suffix='.jpg', seg_map_suffix='.png', 
                     split=split, ignore_index=255,  **kwargs)
    assert osp.exists(self.img_dir) and self.split is not None

` I set num_class = 2 and reduce_zero_label = False. is there anything else needed to handle my binary seg task ?? here is a sample of labels image

aymanaboghonim commented 1 year ago

background metrics is good but object(building) metrics is very poor. image

lucas-sancere commented 1 year ago

HI @aymanaboghonim,

Sorry I cannot really help fully as I am just a new user of mmsegmentation and I didn't participate to the dev.

To me:

set num_class = 2 and reduce_zero_label = False.

Looks fine.

Your data sample looks fine. I guess there is a specific LUT for the plot to make the binary image red and black, if not maybe the pixel value is not 1 for foreground.

Your training works and you have an issue with the metrics, so I would say the issue is probably coming from the training data (maybe imbalanced, some images with issues, bad annotations...). I hope you solved your problem since your last message.