prometheus-eval / prometheus-vision

[ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on customized score rubric, Prometheus-Vision is a good alternative for human evaluation and GPT-4V evaluation.
https://prometheus-eval.github.io/prometheus-vision/
Apache License 2.0
58 stars 6 forks source link

[Question] Visualization 질문 #1

Closed Lala-chick closed 7 months ago

Lala-chick commented 10 months ago

안녕하세요. 모델 공유 해주신것에 대해 감사합니다. 제가 prompt에 따라 image의 어떤 부분에 attention이 걸리는지 visualization을 해보고싶은데 공유해주신 모델에서는 어떤 방식으로 하면 visualization이 가능할까요??

suehyunpark commented 7 months ago

안녕하세요 @Lala-chick님, 질문 감사합니다 :) 저희가 이전에 Volcano라는 연구에서 LLaVA 기반 모델이 답변을 생성할 때 image의 어떤 부분에 얼마나 attend하는지를 시각화하였습니다. 자세한 설명과 사용된 코드는 Volcano 코드 베이스의 llava/visualize 폴더 안에 있으니 확인해보시면 도움될 것 같습니다. 많은 유저들이 볼 수 있도록 아래에서 영어로 답변을 이어나가겠습니다.


Hello, thanks for your interest in Prometheus-Vision! We hope we weren't too late to address your needs :)

I see that you're interested in the possibility of visualizing response-to-image attention using our model.

We would like to note that the exact method was introduced in our work called Volcano (another work done by @sylee0520 and I)! Recently, we have released dedicated instructions for visualizing how much the model attends to image features during generation. The resulting heatmaps can show which parts of the image are highlighted. Below is the figure from our Volcano paper, using the exact implementation in our codebase.

image

Since both Volcano and Prometheus-Vision are based on LLaVA, we think the code in the Volcano repository would be a good starting point. The particular code for gathering image features and image feature attention (masks) are in llava/model/llava_arch_for_image_attention.py. The main code for processing attentions for visualization are in llava/visualize/run_volcano_with_image_attention.py, /llava/visualize/image_attention_heatmap.py, and /llava/visualize/text_attention_heatmap.py.

Hope this could help! If you have any questions about the visualization method, feel free to ask me (@suehyunpark).

suehyunpark commented 7 months ago

@Lala-chick 님, 답변이 되었기를 바라며 추가로 궁금한 점 있으시면 문의주세요!