penghao-wu / vstar

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
https://vstar-seal.github.io/
MIT License
497 stars 32 forks source link

What are the supported instances? #7

Open Tourbillon opened 7 months ago

Tourbillon commented 7 months ago

I tried several instances, but couldn't find a corresponding result, and only returned the original result. I tried questions like, "What number is the pointer pointing to?" and "What color is the person on the left wearing?"

Spring24ch commented 7 months ago

I tried several instances, but couldn't find a corresponding result, and only returned the original result. I tried questions like, "What number is the pointer pointing to?" and "What color is the person on the left wearing?" Did you run through? I have some questions for you, and I always get the following questions.

error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [13 lines of output] Traceback (most recent call last): File "", line 2, in File "", line 34, in File "C:\Users\dellA2061\AppData\Local\Temp\pip-install-ij8a3p3g\deepspeed_17972d03d096414293a1589cf989820e\setup.py", line 163, in abort(f"Unable to pre-compile {op_name}") File "C:\Users\dellA2061\AppData\Local\Temp\pip-install-ij8a3p3g\deepspeed_17972d03d096414293a1589cf989820e\setup.py", line 51, in abort assert False, msg AssertionError: Unable to pre-compile async_io DS_BUILD_OPS=1 [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] One can disable async_io with DS_BUILD_AIO=0 [ERROR] Unable to pre-compile async_io [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.

penghao-wu commented 7 months ago

I tried several instances, but couldn't find a corresponding result, and only returned the original result. I tried questions like, "What number is the pointer pointing to?" and "What color is the person on the left wearing?"

The visual search mechanism will be activated only when the VQA LLM finds that certain entities in the questions cannot be located.