The box is the content that should be given for inference. You can use the OCR information that exists in the dataset or use your own box.
As you can see in Instructions for proposed method in the readme, the inference code has the argument attention_vis. You can get TSR, and TSSR using this argument.
If you want to get something that is not synthesized with the image for visualization, you can use the model's seventh output, att_set
I apologize for the late response.
The box is the content that should be given for inference. You can use the OCR information that exists in the dataset or use your own box.
As you can see in
Instructions for proposed method
in the readme, the inference code has the argumentattention_vis
. You can get TSR, and TSSR using this argument.If you want to get something that is not synthesized with the image for visualization, you can use the model's seventh output,
att_set
https://github.com/naver/garnet/blob/c5a7adb684979ecaa25603dd1aa71a8283d5dadb/CODE/inference.py#L106-L121
att_set[i][0]
: TSR on the i-th layeratt_set[i][1]
: TSSR on the i-th layer