qubvel-org / segmentation_models.pytorch

Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
https://smp.readthedocs.io/
MIT License
9.52k stars 1.66k forks source link

Blank segmentation predictions - UNet #176

Closed gireeshkbogu closed 2 years ago

gireeshkbogu commented 4 years ago

I ran a multi-class segmentation Unet model on 120 greyscale images (train -100 images and masks, valid - 10 images and masks, test - 10 images and masks). It seems I had a good F-score and a lower loss of train data but all my predictions of test images were empty.

# train model for 40 epochs
# save model here if max_score<=valid_logs.
# doucble check if the model is saved or not. This is what you trealyy need to make a github repo.

max_score = 0

for i in range(0, 5):

    print('\nEpoch: {}'.format(i))
    train_logs = train_epoch.run(train_loader)
    valid_logs = valid_epoch.run(valid_loader)

    # do something (save model, change lr, etc.)
    if max_score < valid_logs['iou_score']:
        max_score = valid_logs['iou_score']
        torch.save(model, './best_model.pth')
        print('Model saved!')

    if i == 25:
        optimizer.param_groups[0]['lr'] = 1e-5
        print('Decrease decoder learning rate to 1e-5!')

Epoch: 0
train: 100%|██████████| 12/12 [00:09<00:00,  1.31it/s, dice_loss - 0.1065, iou_score - 0.8085, fscore - 0.8935]
valid: 100%|██████████| 12/12 [00:00<00:00, 14.22it/s, dice_loss - 0.02418, iou_score - 0.953, fscore - 0.9758]
Model saved!

Epoch: 1
train: 100%|██████████| 12/12 [00:09<00:00,  1.32it/s, dice_loss - 0.1067, iou_score - 0.8077, fscore - 0.8933]
valid: 100%|██████████| 12/12 [00:00<00:00, 14.40it/s, dice_loss - 0.02418, iou_score - 0.953, fscore - 0.9758]

Epoch: 2
train: 100%|██████████| 12/12 [00:09<00:00,  1.32it/s, dice_loss - 0.1012, iou_score - 0.8173, fscore - 0.8988]
valid: 100%|██████████| 12/12 [00:00<00:00, 14.27it/s, dice_loss - 0.02418, iou_score - 0.953, fscore - 0.9758]

Epoch: 3
train: 100%|██████████| 12/12 [00:09<00:00,  1.29it/s, dice_loss - 0.1089, iou_score - 0.8046, fscore - 0.8911]
valid: 100%|██████████| 12/12 [00:00<00:00, 14.24it/s, dice_loss - 0.02418, iou_score - 0.953, fscore - 0.9758]

Epoch: 4
train: 100%|██████████| 12/12 [00:09<00:00,  1.30it/s, dice_loss - 0.1161, iou_score - 0.7943, fscore - 0.8839]
valid: 100%|██████████| 12/12 [00:00<00:00, 14.40it/s, dice_loss - 0.02418, iou_score - 0.953, fscore - 0.9758]
# evaluate model on test set
test_epoch = smp.utils.train.ValidEpoch(
    model=best_model,
    loss=loss,
    metrics=metrics,
    device=DEVICE,
)

logs = test_epoch.run(test_dataloader)
valid: 100%|██████████| 17/17 [00:01<00:00, 14.32it/s, dice_loss - 0.01753, iou_score - 0.9657, fscore - 0.9825]

Example output (ground truth followed by a prediction of 2 different classes)

Screen Shot 2020-04-09 at 6 53 16 PM
qubvel commented 4 years ago

Debug a code carefully, check the output of each stage of code. You can write your own training loop to make it easy:


# simple example 
for image, mask in loader:
  image = image.to(device)
  mask = mask.to(device)
  pred = model(image)   # e.g. check that predictions are not empty pred.sum()  > 0

  loss = criterion(pred, mask)
  loss.backward()
  optimizer.step()
  optimizer.zero_grad()
gireeshkbogu commented 4 years ago

I started debugging using the Camvid example. It seems the model doesn't work well against masks with many zero values. For example, car/building/sky have good predictions and pedestrians/pole/signsymbol have empty predictions.

Also in my data, a lot of pixels in training prediction masks are of class 0 while only a few with label 1. Do you think the network marks everything 0 thus you get a very low loss value and blank segmentation output?

for i in range(15):
    n = np.random.choice(len(test_dataset))

    image_vis = test_dataset_vis[n][0].astype('uint8')
    image, gt_mask = test_dataset[n]

    gt_mask = gt_mask.squeeze()

    x_tensor = torch.from_numpy(image).to(DEVICE).unsqueeze(0)
    pr_mask = model.predict(x_tensor)
    pr_mask = (pr_mask.squeeze().cpu().numpy().round())

    # CLASSES = ['sky', 'building', 'pole', 'road', 'pavement', 
    #           'tree', 'signsymbol', 'fence', 'car', 'pedestrian', 'bicyclist', 'unlabelled']

    visualize(
        image=denormalize(image_vis.squeeze()),
        gt_sky=gt_mask[0].squeeze(),
        pr_sky=pr_mask[0].squeeze(),
        gt_building=gt_mask[1].squeeze(),
        pr_building=pr_mask[1].squeeze(),
        gt_pole=gt_mask[2].squeeze(),
        pr_pole=pr_mask[2].squeeze(),
        gt_road=gt_mask[3].squeeze(),
        pr_road=pr_mask[3].squeeze(),
        gt_pavement=gt_mask[4].squeeze(),
        pr_pavement=pr_mask[4].squeeze(),
        gt_tree=gt_mask[5].squeeze(),
        pr_tree=pr_mask[5].squeeze(),
        gt_signsymbol=gt_mask[6].squeeze(),
        pr_signsymbol=pr_mask[6].squeeze(),
        gt_fence=gt_mask[7].squeeze(),
        pr_fence=pr_mask[7].squeeze(),
        gt_car=gt_mask[8].squeeze(),
        pr_car=pr_mask[8].squeeze(),
        gt_pedestrian=gt_mask[9].squeeze(),
        pr_pedestrian=pr_mask[9].squeeze(),
        gt_bicyclist=gt_mask[10].squeeze(),
        pr_bicyclist=pr_mask[10].squeeze(),
        gt_unlabelled=gt_mask[11].squeeze(),
        pr_unlabelled=pr_mask[11].squeeze()
    )
Screen Shot 2020-04-10 at 10 26 58 AM

Let's zoom pedestrian masks

Screen Shot 2020-04-10 at 10 28 05 AM

Also, when I ran the model on my data using just one-class and sigmoid activation, I do get the prediction masks. So, therefore, I would think the above is a correct problem. Somehow softmax2d is having a hard time dealing with image masks with too many zeroes.

Screen Shot 2020-04-10 at 11 12 20 AM

May be applying softmax only on the values !=0 would solve this problem like here where they use masked_softmax?

TimbusCalin commented 3 years ago

Did you manage to solve this problem? If so, how did you?

Thank you 👍

TimbusCalin commented 3 years ago

Debug a code carefully, check the output of each stage of code. You can write your own training loop to make it easy:

# simple example 
for image, mask in loader:
  image = image.to(device)
  mask = mask.to(device)
  pred = model(image)   # e.g. check that predictions are not empty pred.sum()  > 0

  loss = criterion(pred, mask)
  loss.backward()
  optimizer.step()
  optimizer.zero_grad()

I can confirm the same phenomenon --- happens only when the ground truth has a lot of 0s (void/empty class)

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stalled for 7 days with no activity.