Accessing geom segid for semantic segmentation?

After a lot of googling and trial, I have a solution for myself. Logging this for any future users who are interested in obtaining a segmentation map.

Set the segmentation=True in the render function for the mujoco environment.
1. This will return a (W, H, 2) array. I found a reference to this output in the deepmind continuous control suite camera code.
  
  this is a (height, width, 2) int32 numpy array where the first channel contains the integer ID of the object at each pixel, and the second channel contains the corresponding object type (a value in the mjtObj enum). Background pixels are labeled (-1, -1).

However, this comment is not entirely true for the mujoco-py render function. The render output seems like the first channel contains the object type, and the second channel contains the object id.

Using the first channel mask, we can figure out what type of object it is (mesh, geom, body, etc.) by comparing the mask value with the mujoco enums. You can see the enum list here. They are also accessible through mujoco_py.
Once you know what the object is, you can use id2name or the id to do what you want with the object. In my case, I wanted to color all the named objects with the prefix "robot" red, since I want to segment my robot.

Here is my example code for creating an image segmentation where the robot geoms are red and object geom is green.

from mujoco_py.generated import const

img = np.zeros(512, 512, 3)
seg = env.render(segmentation=True)
types = seg[:, :, 0]
ids = seg[:, :, 1]
geoms = types == const.OBJ_GEOM
geoms_ids = np.unique(ids[geoms])
for i in geoms_ids:
    name = env.sim.model.geom_id2name(i)
    if "robot" in name:
        img[ids == i] = (255, 0, 0)
    elif "object0" == name:
        img[ids == i] = (0, 255, 0)

seg

openai / mujoco-py

Accessing geom segid for semantic segmentation? #516