linhuixiao / CLIP-VG

[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.
https://github.com/linhuixiao/CLIP-VG
Apache License 2.0
105 stars 5 forks source link

可视化结果 #10

Open L-JunJie opened 6 months ago

L-JunJie commented 6 months ago

哈喽,作者您好,代码中有一些不明白的地方,最后的iou是根据什么计算的,我尝试可视化结果,可是最后模型输出并不能实现定位,定位框所在太小了,或者您可以提供可视化的代码吗

linhuixiao commented 6 months ago

@L-JunJie

Hi, you question is similar to https://github.com/linhuixiao/CLIP-VG/issues/11. I have given a specific answer to that issue, maybe you can get some help by referring to it.

Besides, for your question, the output box of the model is a normalized coordinate, you just need to know the original image size and multiply back the original length and width correspondence.