henghuiding / Vision-Language-Transformer

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
MIT License
338 stars 21 forks source link

How to inference on my own image and text? #7

Open kelisiya opened 2 years ago

changliu19 commented 2 years ago

Hi,

Please see #3 and the input format in

https://github.com/henghuiding/Vision-Language-Transformer/blob/9b24015566fa820e3eddbbd8942fa44512ec1b3c/callbacks/eval.py#L102