Segmentation mask from coco style dataset is not entirely accurate

Hi, while trying to train a model on a dataset (cell segmentation) that requires quite accurate segmentations from the instance segmentation model I've noticed that the predictions always introduce quite a "big" gap between cells which is not present in the original dataset which was converted to coco. While trying to find the error I found out that decoding the segmentations with pycocotools doesn't give back the original segmentation. It is missing a few pixels which might not be important for many instance segmentation tasks but for many medical instance segmentation tasks, accurate masks are very important. I've created a small toy image that shows the problem on an image with a single white circle on back blackground test_file .

To reproduce the problem run the following code with the white circle on the back blackground:

import json
import numpy as np
import cv2
import pycocotools
from pycocotools.coco import COCO

def create_coco_dict(file):
    '''
    Creates coco dataset
    '''
    files = {}
    files['info'] = {"year": 2222, "version": "1.0", "description": "Object detection", "date_created": "2222"}
    files['licenses'] = [{'id': 1,
      'name': 'GNU General Public License v3.0',
      'url': 'test'}]
    files["type"] = "instances"
    files['categories'] = []
    files["annotations"] = []
    files['images'] = []
    files['categories'].append({'id': 0, 'name': "0", 'supercategory': "0"})
    all_annos = 0                    
    im = cv2.imread(file, 0)
    empty = np.zeros_like(im)
    files['images'].append({'date_captured': '2021',
                              'file_name': file,
                              'id': 0,
                              'height': im.shape[0],
                              'width': im.shape[1]})

    tmp = im.copy()
    #get contours of image
    contours,hierachy = cv2.findContours(tmp, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    #black images to draw the contours on
    blank = np.zeros_like(tmp)

    for cnt, cont in enumerate(contours):
        segmentation = []
        xmin,ymin,width,height = cv2.boundingRect(cont) #bounding box
        if width * height < 3:
            continue
        image_height = im.shape[0]
        image_width = im.shape[1]

        #draw contour for verification
        cv2.drawContours(empty, [cont], 0, 255, -1)

        cont = cont.flatten().tolist() #contour as 1d array has shape (x1,y1,x2,y2,...,x_n, y_n)
        #as in https://github.com/facebookresearch/Detectron/issues/100#issuecomment-362882830
        if len(cont) > 4: #only of at least 2 points are there
            segmentation.append(cont)
        else:
            continue
        if len(segmentation) == 0: #check again if segmentations are in list
            continue
        files["annotations"].append({'segmentation': segmentation,
                                      'area': width * height,
                                      'image_id': 0,
                                      'iscrowd':0,
                                      'bbox': [xmin,ymin,width,height],
                                      "category_id": 0,
                                      "id": all_annos})
        all_annos += 1

        cv2.imwrite("drawn_contours.png", empty)

    return files
filename = "test_file.png" #insert your filename here (the image with the white circle)

di = create_coco_dict(filename)
with open("test.json", "w") as handle:
    json.dump(di, handle)

coco_annotation = COCO("test.json")
ann_ids = coco_annotation.getAnnIds(imgIds=[0], iscrowd=None)
anns = coco_annotation.loadAnns(ann_ids)

mask = np.zeros(cv2.imread(filename,0).shape)
mask += coco_annotation.annToMask(anns[0])
cv2.imwrite("coco_out.png", mask * 255.)

assert(np.array_equal(cv2.imread("coco_out.png",0),cv2.imread("drawn_contours.png", 0))) #images are not equal even though they should be(?)

The drawn_contours.png are the contours of the original image drawn which is the correct and expected mask. The coco_out.png is the output when decoding the segmentation from the coco dataset. When looking at the images the coco_out.png is missing just a few pixels which is not usable for many datasets that require accurate segmentation. Of course you are not the authors of the pycocotools but since this is a data format very often used in this repo I've asked myself if this is a known problem (I've opened a smiliar issue in the cocoapi repo). If a made an error please let me know :) Thanks

Appendix if you don't want to create the output images. To see the difference quickly switch between them: The coco_out.png The drawn_contours.png

The output more visible on my dataset with cells: The output when decoding mask with pycocotools: coco The original output like it should be:

open-mmlab / mmdetection

Segmentation mask from coco style dataset is not entirely accurate #7816