penghao-wu / vstar

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
https://vstar-seal.github.io/
MIT License
481 stars 31 forks source link

Some recommendations about your paper #6

Open dinhanhx opened 6 months ago

dinhanhx commented 6 months ago

I guess your paper is being reviewed, and there might be more changes. Therefore, some of my recommendations might be irrelevant.

Version: https://arxiv.org/pdf/2312.14135v2.pdf

Figure 1

image

This figure should be referenced somewhere first (e.g. the paragraph that you mention V* mechanism for the first time) because to me, this figure is kinda out of context. I don't really know which part is closely related to this figure.

Algorithm 1

image

You should explain what the symbols are.

penghao-wu commented 6 months ago

Thank you for your suggestions. We will make corresponding modifications in the next version.

mega-optimus commented 4 months ago

For Algorithm 1, I'm confused by the variables δ and s. What do they mean?

mega-optimus commented 4 months ago

For Algorithm 1, I'm confused by the variables δ and s. What do they mean?

OK, Algorithm 2 implicitly clarifies them: δ is threshold, s is "search target description"