stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy.ai
MIT License
18.72k stars 1.43k forks source link

Cookbook for DSPy with multimodal support #1721

Open Ranking666 opened 1 week ago

Ranking666 commented 1 week ago

Are there any examples that support local multimodal large models? Including data format, model loading method

pavan4 commented 1 week ago

Check https://github.com/stanfordnlp/dspy/pull/1495

Ranking666 commented 1 week ago

Is there a complete example?

isaacbmiller commented 1 week ago

For now there isn't a great cookbook. MMMU.ipynb shows the basics.

I am working on a better cookbook. If there is something in particular you want to see let me know and I can see how to fit it in

thiccvitalik commented 1 week ago

An example with a list of input images, where the list size can vary, would be great.

isaacbmiller commented 1 week ago

What's the downstream use case here? (Not disagreeing just curious)

thiccvitalik commented 1 week ago

My use case is multi-page PDFs where the number of pages can vary, so I can’t predetermine a fixed number of input images (I convert multiple pages to a single image, but this still requires a flexible list size).