decode image string embedded in json - Githubissues

wkentaro / labelme

Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).

https://labelme.io

Other

13.63k stars 3.42k forks source link

decode image string embedded in json #116

Closed mluerig closed 6 years ago

mluerig commented 6 years ago

labelme saves everything (image and labels/polygons) in json files. I am trying to decode the image string that is stored along with the polygon-points in the json files in python, so I can convert both to a numpy-array and use the points to draw a binary mask (some models like mask rcnn don't use json, but require numpy-arrays for training).

What type of encoding is used to store the image in the json files? (under "imageData")

wkentaro commented 6 years ago

Please see the example for how to convert the annotations to numpy array. https://github.com/wkentaro/labelme/blob/master/labelme/cli/json_to_dataset.py

wkentaro commented 6 years ago

Also, it has option --nodata which does not save imageData. You can use imagePath which is relative to the annotation file (json file). https://github.com/wkentaro/labelme#usage

mluerig commented 6 years ago

ah excellent, thanks. probably couldn't hurt to point to this in the readme

wkentaro commented 6 years ago

Ah, sorry. It is not described in the README.md on the top page. And you need to refer to: https://github.com/wkentaro/labelme/tree/master/examples/single_image#convert-to-dataset

mluerig commented 6 years ago

jupp, I saw that and as you already implemented with the user-warning it's fine for single files, but not for actual workflow to create a full data set. it might be nice to point to that utility script somewhere (e.g. in the tutorial so it is clear how people can customize their own workflow. I feel like the need to adjust the format of the labels and training images to segmentation models is a common issue out there

wkentaro commented 6 years ago

Recently I added script that converts (multiple) json files to a VOC-like dataset (labelme2voc.py) for semantic segmentation and instance segmentation as examples. Are they what you are pointing out?

wkentaro commented 6 years ago

Actually there is no script that converts labelme's json files to COCO format. (because I usually create my own script that converts json files to a numpy's npz files which can be easily loaded and used for training) https://github.com/wkentaro/labelme/issues/34 is the issue for that.

mluerig commented 6 years ago

ah yes, nice!

austinmw commented 6 years ago

Anyone have a script for converting multiple XML polygon or binary PNG segmentations to single json?

krishvishal commented 6 years ago

This code takes .json created using labelme as input and outputs object instance masks in binary .png images and saves both images and their masks in a neat folder structure. This code is adapted from label2voc.py by wkentaro. This code is more suitable for people using Mask-RCNN. If you have any queries ask me.

import glob
import json
import os
import os.path as osp

import numpy as np
import PIL.Image
import cv2
import labelme

DATA_DIR = '/home/war/Downloads/initial_experiment/train'
OUT_DIR = DATA_DIR + '/data_and_masks'
class_names = []
class_name_to_id = {}
for i, line in enumerate(open(DATA_DIR + '/labels.txt').readlines()):
    class_id = i - 1  # starts with -1
    class_name = line.strip()
    class_name_to_id[class_name] = class_id
    if class_id == -1:
        assert class_name == '__ignore__'
        continue
    elif class_id == 0:
        assert class_name == '_background_'
    class_names.append(class_name)
class_names = tuple(class_names)
print('class_names:', class_names)
out_class_names_file = osp.join(DATA_DIR, 'class_names.txt')
with open(out_class_names_file, 'w') as f:
    f.writelines('\n'.join(class_names))
print('Saved class_names:', out_class_names_file)

if osp.exists(OUT_DIR):
    print('Output directory already exists:', OUT_DIR)
    quit(1)
os.makedirs(OUT_DIR)
for label_file in sorted(glob.glob(osp.join(DATA_DIR, '*.json'))):
    with open(label_file) as f:
        base = osp.splitext(osp.basename(label_file))[0]
        data = json.load(f)
        img_file = osp.join(osp.dirname(label_file), data['imagePath'])
        img = np.asarray(PIL.Image.open(img_file))
        lbl = labelme.utils.shapes_to_label(
            img_shape=img.shape,
            shapes=data['shapes'],
            label_name_to_value=class_name_to_id,
        )
        instance1 = np.copy(lbl)
        instance2 = np.copy(lbl)
        pos_1 = np.where(lbl==1)
        pos_2 = np.where(lbl==2)
        instance1[pos_2] = 0
        instance2[pos_1] = 0
        instance2 = instance2 / 2

        instance1 = instance1*255
        instance2 = instance2*255
        os.makedirs(osp.join(OUT_DIR, base))
        os.makedirs(osp.join(OUT_DIR, base,'images'))
        PIL.Image.fromarray(img).save(osp.join(OUT_DIR, base, 'images', base + '.png'))
        os.makedirs(osp.join(OUT_DIR, base, 'masks'))
        cv2.imwrite(osp.join(OUT_DIR, base, 'masks', base + '_instance1' + '.png'), instance1)
        cv2.imwrite(osp.join(OUT_DIR, base, 'masks', base + '_instance2' + '.png'), instance2)

kaolin commented 5 years ago

Would this be useful to generalize for more labels?

mshoaibali commented 5 years ago

Does this code also works for .json files created by VGG image annotator?

INF800 commented 3 years ago

Thank you everyone!

harshp777 commented 3 years ago

This code takes .json created using labelme as input and outputs object instance masks in binary .png images and saves both images and their masks in a neat folder structure. This code is adapted from label2voc.py by wkentaro. This code is more suitable for people using Mask-RCNN. If you have any queries ask me.

import glob
import json
import os
import os.path as osp

import numpy as np
import PIL.Image
import cv2
import labelme

DATA_DIR = '/home/war/Downloads/initial_experiment/train'
OUT_DIR = DATA_DIR + '/data_and_masks'
class_names = []
class_name_to_id = {}
for i, line in enumerate(open(DATA_DIR + '/labels.txt').readlines()):
  class_id = i - 1  # starts with -1
  class_name = line.strip()
  class_name_to_id[class_name] = class_id
  if class_id == -1:
      assert class_name == '__ignore__'
      continue
  elif class_id == 0:
      assert class_name == '_background_'
  class_names.append(class_name)
class_names = tuple(class_names)
print('class_names:', class_names)
out_class_names_file = osp.join(DATA_DIR, 'class_names.txt')
with open(out_class_names_file, 'w') as f:
  f.writelines('\n'.join(class_names))
print('Saved class_names:', out_class_names_file)

if osp.exists(OUT_DIR):
  print('Output directory already exists:', OUT_DIR)
  quit(1)
os.makedirs(OUT_DIR)
for label_file in sorted(glob.glob(osp.join(DATA_DIR, '*.json'))):
  with open(label_file) as f:
      base = osp.splitext(osp.basename(label_file))[0]
      data = json.load(f)
      img_file = osp.join(osp.dirname(label_file), data['imagePath'])
      img = np.asarray(PIL.Image.open(img_file))
      lbl = labelme.utils.shapes_to_label(
          img_shape=img.shape,
          shapes=data['shapes'],
          label_name_to_value=class_name_to_id,
      )
      instance1 = np.copy(lbl)
      instance2 = np.copy(lbl)
      pos_1 = np.where(lbl==1)
      pos_2 = np.where(lbl==2)
      instance1[pos_2] = 0
      instance2[pos_1] = 0
      instance2 = instance2 / 2

      instance1 = instance1*255
      instance2 = instance2*255
      os.makedirs(osp.join(OUT_DIR, base))
      os.makedirs(osp.join(OUT_DIR, base,'images'))
      PIL.Image.fromarray(img).save(osp.join(OUT_DIR, base, 'images', base + '.png'))
      os.makedirs(osp.join(OUT_DIR, base, 'masks'))
      cv2.imwrite(osp.join(OUT_DIR, base, 'masks', base + '_instance1' + '.png'), instance1)
      cv2.imwrite(osp.join(OUT_DIR, base, 'masks', base + '_instance2' + '.png'), instance2)

Can this be used to extract masks of semantic segmantation from .json file of labelme ??