LLaVA model error in VLLM through Langchain

tsantra commented 1 month ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_community.llms import VLLM

llm = VLLM( model="llava-hf/llava-1.5-7b-hf", trust_remote_code=True, # mandatory for hf models max_new_tokens=128, top_k=10, top_p=0.95, temperature=0.8,
)

OR

llm = VLLM( model="llava-hf/llava-1.5-7b-hf", trust_remote_code=True, # mandatory for hf models max_new_tokens=128, top_k=10, top_p=0.95, temperature=0.8,
image_input_type="pixel_values",
image_token_id=123, image_input_shape="224,224,3", image_feature_size=512,

)

Both the ways of instantiating the VLLM class gives the same error.

Error Message and Stack Trace (if applicable)

llm = VLLM(

rank0: File /miniforge3/envs/ipex-vllm/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in init rank0: raise validation_error rank0: pydantic.v1.error_wrappers.ValidationError: 1 validation error for VLLM

rank0: Provide image_input_type and other vision related configurations through LLM entrypoint or engine arguments. (type=assertion_error)

Description

I am trying to use VLLM through Langchain to run LLaVA model. I am using CPU to run my code. I am getting this error: "Provide image_input_type and other vision related configurations through LLM entrypoint or engine arguments. "

I went through the source code of vllm/vllm/engine/arg_utils.py:class EngineArgs: and passed the Vision Configurations in VLLM class as above. However, I see that, even after setting, image_input_type="pixel_values" in VLLM class (as seen above), the self.image_input_type in the EngineArgs class shows value as None.

Name: vllm Version: 0.4.2+cpu Summary: A high-throughput and memory-efficient inference and serving engine for LLMs Home-page: https://github.com/vllm-project/vllm Author: vLLM Team Author-email: License: Apache 2.0 Location: /home/ceed-user/miniforge3/envs/ipex-vllm/lib/python3.11/site-packages/vllm-0.4.2+cpu-py3.11-linux-x86_64.egg Requires: cmake, fastapi, filelock, lm-format-enforcer, ninja, numpy, openai, outlines, prometheus-fastapi-instrumentator, prometheus_client, psutil, py-cpuinfo, pydantic, requests, sentencepiece, tiktoken, tokenizers, torch, transformers, triton, typing_extensions, uvicorn Required-by:

System Info

langchain==0.2.7 langchain-community==0.2.7 langchain-core==0.2.12 langchain-text-splitters==0.2.2

tsantra commented 1 month ago

I also want to use the llm through VLLM in a RAG pipeline. Does LLaVA though VLLM support multiple images as input (which would be needed in a RAG pipeline)?

tsantra commented 1 month ago

any update on this?

langchain-ai / langchain