stanfordnlp / dspy

DSPy: The framework for programming—not prompting—language models
https://dspy.ai
MIT License
19.06k stars 1.46k forks source link

Cookbook for DSPy with multimodal support #1721

Open Ranking666 opened 3 weeks ago

Ranking666 commented 3 weeks ago

Are there any examples that support local multimodal large models? Including data format, model loading method

pavan4 commented 3 weeks ago

Check https://github.com/stanfordnlp/dspy/pull/1495

Ranking666 commented 3 weeks ago

Is there a complete example?

isaacbmiller commented 2 weeks ago

For now there isn't a great cookbook. MMMU.ipynb shows the basics.

I am working on a better cookbook. If there is something in particular you want to see let me know and I can see how to fit it in

thiccvitalik commented 2 weeks ago

An example with a list of input images, where the list size can vary, would be great.

isaacbmiller commented 2 weeks ago

What's the downstream use case here? (Not disagreeing just curious)

thiccvitalik commented 2 weeks ago

My use case is multi-page PDFs where the number of pages can vary, so I can’t predetermine a fixed number of input images (I convert multiple pages to a single image, but this still requires a flexible list size).