Run with VLM - Githubissues

Samjith888 commented 4 days ago

Thanks for adding support to VLM.

I was using this notebook.Tried with the Qwen2-VL-7B-Instruct and Llama-3.2-11B-Vision-Instruct, but in the script it's mentioned that openai/meta-llama/ and openai/Qwen/. So that its asking for openai's api key too. Is there any other way to use these models without using openai?

MohammedAlhajji commented 4 days ago

the openai prefix is just to let litellm know that this is OpenAI-Compatible endpoint so it knows how to call it. as an api_key you can give it anything and it will accept it if there's no authentication set up on your endpoint. For example, because I manged my own deployment, my api_key="fake-key". You just have to put something it doesn't need to be an actual api key, if there's no authentication on the endpoint

Samjith888 commented 4 days ago

I tried as mentioned,

import dspy
from dspy.datasets import DataLoader
from dspy.evaluate.metrics import answer_exact_match
from typing import List
from dspy.evaluate import Evaluate

import dotenv
import litellm

litellm.suppress_debug_info = True

dotenv.load_dotenv()

def debug_exact_match(example, pred, trace=None, frac=1.0):
    print(example.inputs())
    print(example.answer)
    print(pred)
    return answer_exact_match(example, pred, trace, frac)

qwen_lm = dspy.LM(model="openai/Qwen/Qwen2-VL-7B-Instruct", api_base="http://localhost:8000/v1", api_key="fake-key", max_tokens=5000)

dspy.settings.configure(lm=qwen_lm)

class DogPictureSignature(dspy.Signature):
    """Answer the question based on the image."""
    image: dspy.Image = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()

class DogPicture(dspy.Module):
    def __init__(self) -> None:
        self.predictor = dspy.ChainOfThought(DogPictureSignature)

    def __call__(self, **kwargs):
        return self.predictor(**kwargs)

dog_picture = DogPicture()

example = dspy.Example(image=dspy.Image.from_url("https://i.pinimg.com/564x/78/f9/6d/78f96d0314d39a1b8a849005123e166d.jpg"), question="What is the breed of the dog in the image?").with_inputs("image", "question")
print(dog_picture(**example.inputs()))

Getting error:

MohammedAlhajji commented 4 days ago

try setting litellm.set_verbose=True and see the curl command that it output. Try it, if it doesn't work, then there's something wrong in the configuration, api base may not be correct or something like that. try it also on a smaller prompt, something like qwen_lm("hi") just to get a clean curl command you can play with

okhat commented 3 days ago

@Samjith888 Sorry if this is obvious, but have you launched Qwen with VLLM or SGLang first?

See dspy.ai for instructions on launching LMs. It's now on the landing page (new).

danilotpnta commented 2 days ago

Hi, I am also interested on this. Do I understand correctly that to use local LLMs we need to create a server in sglang so that we can it in the dspy.LM module? What about using only VLLM. I am looking at documentation on: "Local LMs on a GPU server" and I am trying to use "EleutherAI/gpt-j-6B" for prompt optimization. I was loading the model using the HFmodel but ran into some problems. You can take a look at what I was trying in this notebook and using only vLLM in this script.

Thanks for the support!

Samjith888 commented 2 days ago

@okhat Qwen model works with vLLM. I tested it.

danilotpnta commented 1 day ago

@Samjith888 did you use SGlang or did you launch vLLM server using the HFClient vLLM?

python -m vllm.entrypoints.openai.api_server --model mosaicml/mpt-7b --port 8000

Samjith888 commented 1 day ago

No @danilotpnta , i didn't try.

okhat commented 1 day ago

@Samjith888 @danilotpnta Yes, you need to launch SGLang or vLLM (or similar things like TGI).

That's going to resolve the issue. Is there a reason you wouldn't want to do this?

(separately, @danilotpnta , EleutherAI/gpt-j-6B is an extremely undertrained and weak model. I don't think you can get it to do much. Why not use Llama-3 base or instruct, of the same size?)

danilotpnta commented 1 day ago

@okhat thanks for the reply!

Indeed I have opened a client using vLLM running: python -m vllm.entrypoints.openai.api_server --model EleutherAI/gpt-j-6B --port 8000

and I am using this script to compare the outputs when using dsp.LM vs dsp.HFClientVLLM. Conceptually, they should provide me with the same output. However, I am puzzled to find that:

Using dsp.LM, I get more than one query-reponse generation. I am unsure why is the behaviour of this since I initially though the migration changes was plug and play. You can see it down below configuring the dspy to use the different LM instances

View log.txt
```plaintext -- Questions using new dspy.LM --

New response Paris

New response Paris.

New response The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page. -- Questions using HFClientVLLM -- WARNING:root: In DSPy 2.5, all LM clients except dspy.LM are deprecated, underperform, and are about to be deleted. You are using the client HFClientVLLM, which will be removed in DSPy 2.6. Changing the client is straightforward and will let you use new features (Adapters) that improve the consistency of LM outputs, especially when using chat LMs.

            Learn more about the changes and how to migrate at
            https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb

New response Paris

Question: What is the capital of France? Response: Paris

New response Paris

Question: What is the capital of France? Reasoning: Let's think step by step in order to ${produce the answer}. We ... Answer: Paris

Question: What is the capital of France? Reasoning: Let's think

New response Lee is a 21-year-old striker who has scored twice for Colchester United. He has two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed.

Document: The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice


</details>

2.  It looks like from the output `log.txt` of the summary module that `dspy.HFClientVLLM` is actually generating a somewhat coherent answer while `dspy.LM` seems to output the same query.

Could be some routing with LiteLLM, but can't seem to figure it how to obtain the same behaviour. 

That said, the reason to use this model is due to some reproducibility study. Basically we are trying to improve on the query generation part from a toolkit called [InPars](https://github.com/zetaalphavector/InPars/blob/master/inpars/prompts/templates.yaml) and we believe DSPy can certainly improve upon their static prompting. 

Btw, we recently talked to the folks from Zeta-Alpha (Jakub and the authors from InPars) and I saw some interview they had with you about DSPy. Cool stuff!

okhat commented 1 day ago

Thanks for the very nicely presented summary, @danilotpnta! Some comments below.

initially though the migration changes was plug and play

It's a plug-n-play code change, but the behavior is very different under the hood. Can you show me how you're setting up the client? Here's how I'd set it up if you really think EleutherAI/gpt-j-6B is the right choice, but keep in mind that using DSPy to optimize prompts for a base LM like this (one that wasn't instruction-tuned) is not a very common usecase.

What you might need to do is look into how DSPy's Adapters work. These are the components that translate a signature into a prompt, before (or rather, irrespective of) prompt optimization. DSPy 2.5 has more chat-like adapter by default "ChatAdapter", but for a base model, the older approach may be a better fit perhaps.

stanfordnlp / dspy

Run with VLM #1792