google / sg2im

Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 2018
Apache License 2.0
1.29k stars 230 forks source link

Vocab mismatch between checkpoint and paper #19

Closed aluo-x closed 4 years ago

aluo-x commented 4 years ago

Was just running the checkpoints for COCO & VG.

For VG there are indeed 45 relationships plus a "in_image" relationship, which matches the paper on arxiv. However, for COCO there are additional "touching" relationships, which brings the total of non "in_image" relationships to 10.

@jcjohnson could you potentially help clarify this question?

jcjohnson commented 4 years ago

In some of my earlier experiments I tried some additional relationships on COCO, defined as follows (replaces https://github.com/google/sg2im/blob/master/sg2im/data/coco.py#L337)

      touching = False
      if self.touching_relations:
        area_s = (sx1 - sx0) * (sy1 - sy0)
        area_o = (ox1 - ox0) * (oy1 - oy0)
        ix0, ix1 = max(sx0, ox0), min(sx1, ox1)
        iy0, iy1 = max(sy0, oy0), min(sy1, oy1)
        area_i = max(0, ix1 - ix0) * max(0, iy1 - iy0)
        iou = area_i / (area_s + area_o - area_i)
        touching = 0.1 < iou < 0.5

      if sx0 < ox0 and sx1 > ox1 and sy0 < oy0 and sy1 > oy1:
        p = 'surrounding'
      elif sx0 > ox0 and sx1 < ox1 and sy0 > oy0 and sy1 < oy1:
        p = 'inside'
      elif theta >= 3 * math.pi / 4 or theta <= -3 * math.pi / 4:
        p = 'right touching' if touching else 'left of'
      elif -3 * math.pi / 4 <= theta < -math.pi / 4:
        p = 'bottom touching' if touching else 'above'
      elif -math.pi / 4 <= theta < math.pi / 4:
        p = 'left touching' if touching else 'right of'
      elif math.pi / 4 <= theta < 3 * math.pi / 4:
        p = 'top touching' if touching else 'below'
      p = self.vocab['pred_name_to_idx'][p]
      triples.append([s, p, o])

However in the final models I didn't end up using these relationships. They are still present in the vocab of the pretrained models, but these relationships were not used at all during training and the embeddings associated with these relationships in the released model weights will be random. Thus if you try to pass a scene graph with one of these "touching" relationships, you will probably get a garbage output from the model.

aluo-x commented 4 years ago

Many thanks for the impressively quick reply! Really appreciate the clarification!

jcjohnson commented 4 years ago

My reply times are usually bimodal: either I respond right away or it will fall out of my inbox and be forgotten forever!