yrcong / STTran

Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021
MIT License
181 stars 34 forks source link

About Frame Encoding #25

Closed FXLYZ closed 2 years ago

FXLYZ commented 2 years ago

Hi, In the Temporal Decoder,

"we customize the Frame Encodings to inject the temporal position in the relationship representations. The frame encodings E f are constructed with learned embedding parameters"

I can't understand this sentence, can you explain Frame Encoding? Thanks.

yrcong commented 2 years ago

Frame encodings help the model understand the sequence of the relationships in the sliding window. Assumed that there are 2 relations in the (t) th frame and 3 in the (t+1) th frame. Due to permutation invariant, we need to inject two different encodings into these 5 relations. best, Yuren