Closed codingS3b closed 2 years ago
From looking a bit deeper, I guess I can manage to fill most of the stuff myself, however, I'm puzzled what kind of data is actually expected for the segmentations here (isn't .encode('utf8')
usually called on strings?
To sum it up, I think the structure expected is as follows:
'meta':
'category_labels': dict # maps category ids to category names
'sequences': list of dicts # length is equal to the number of videos in the dataset
# Now each dict of the 'sequences' list (i.e. the information of a single video) has format
'id': int,
'width': int,
'height': int,
'image_paths': list of str,
'categories': dict, # maps each occuring instance id of the video to the respective category id
'segmentations': list of dict # length equal to the number of frames in the video
# Now each dict of the 'segmentations' list (i.e. the information of a single frame in the video)
# maps instance ids to some value that I'm not sure about
@Ali2500, are my assumptions correct? If yes, I would only need a pointer on how to correctly format the 'segmentations' entries (or more the values in the dictionaries).
Hi,
The RLE encoded mask returned by pycocotools is in binary format if I recall correctly, therefore you need to call .decode("utf-8")
on it before dumping it to JSON.
So given a numpy array mask
of type uint8, the entry in the segmentations
field for this frame and instance would be: pycocotools.mask.encode(np.asfortranarray(mask))["counts"].decode("utf-8")
. For efficiency, we only store the actual RLE encoding and not the full dict returned by pycocotools since the image dimensions are the same across a video.
I unfortunately don't have a conversion script at hand, but it seems you deciphered the format correctly. If I remember correctly, the bboxes
and areas
fields aren't used in the final code anywhere though (doesn't hurt to have them though).
Thanks for your help @Ali2500, I think I now managed to get your format right. At least using the GenericVideoSequence
class seems to work out fine!
I would love to try out your tool but am struggling to get the json file into the format that is expected by your code in order to pass my data in.
Currently my json file follows the format of the youtubevis challenge. That is, when applying
json.load
the dictionary has the following structure:where the structure for each video looks like
and the structure for each annotation looks like
and the structure for each category looks like
Do you have a script at hand or maybe some tips how to convert that into the file format you expect?
I would greatly appreciate your help here!