Add ColQwen2 example - Githubissues

advay-modal commented 1 month ago

Adds a chat with RAG example using the following things:

ColQwen2 to index the docs
Qwen2-VL as a VLM
Gradio chatUI and PDF upload UI

Some things I would do if I wanted people to actually use this in prod. Curious which of these people think is worth doing

Optimize the cold start (takes 2 mins now on average)
Optimize the inference time for the chat (currently ~10s). Would try vllm, but there's an issue with VLLM and the transformers version that ColQwen2 needs, so I'd have to build vllm from source, which would increase build time?
Try to make the app use less memory (I currently need an 80gb A100, largely because though the underlying model is the same, I couldn't find a clean way to use the same underlying object for the model, and so I end up having 2 model objects)

Type of Change

[x] New example
[ ] Example updates (Bug fixes, new features, etc.)
[ ] Other (changes to the codebase, but not to examples)

Checklist

[ ] Example is testable in synthetic monitoring system, or lambda-test: false is added to example frontmatter (---)
- [x] Example is tested by executing with modal run or an alternative cmd is provided in the example frontmatter (e.g. cmd: ["modal", "deploy"])
- [ ] Example is tested by running with no arguments or the args are provided in the example frontmatter (e.g. args: ["--prompt", "Formula for room temperature superconductor:"]
[ ] Example is documented with comments throughout, in a Literate Programming style.
[ ] Example does not require third-party dependencies to be installed locally
[ ] Example pins its dependencies
- [ ] Example pins container images to a stable tag, not a dynamic tag like latest
- [ ] Example specifies a python_version for the base image, if it is used
- [ ] Example pins all dependencies to at least minor version, ~=x.y.z or ==x.y
- [ ] Example dependencies with version < 1 are pinned to patch version, ==0.y.z

Outside contributors

You're great! Thanks for your contribution.

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-7a1075b.modal.run

charlesfrye commented 1 month ago

Optimize the cold start (takes 2 mins now on average)

Shorter than 1 min would be nice. Where are we spending time here? Is it loading the models, indexing the PDFs, or something else?

Optimize the inference time for the chat (currently ~10s). Would try vllm, but there's an issue with VLLM and the transformers version that ColQwen2 needs, so I'd have to build vllm from source, which would increase build time?

I would suggest not optimizing inference time unless it's a durable improvement. I suspect vllm will resolve this issue soon, so I'd skip working on it for now.

Try to make the app use less memory (I currently need an 80gb A100, largely because though the underlying model is the same, I couldn't find a clean way to use the same underlying object for the model, and so I end up having 2 model objects)

This might be worth looking into. Getting onto A100-40s (and later L40Ses) would be really nice.

advay-modal commented 1 month ago

Cold start time is coming from loading the models

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-c97a29f.modal.run

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-e52659d.modal.run

advay-modal commented 1 month ago

Have reduced the cold start

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-f06a9a5.modal.run

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-1aa3914.modal.run

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-c2767f3.modal.run

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-64c2fd0.modal.run

erikbern commented 4 weeks ago

What's the status?

charlesfrye commented 4 weeks ago

What's the status?

@erik-dunteman to pick this one up

charlesfrye commented 3 weeks ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-72f967a.modal.run

charlesfrye commented 3 weeks ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-5e0cb53.modal.run

erik-dunteman commented 3 weeks ago

This one should be ready to go, pending the following:

CI passes
if we want to add automated testing (in the context of gradio, would need to call the gradio api directly from local entrypoint)

Changes since I took the PR over:

state no longer stored in class's "self", instead using modal dict, allowing concurrent users and horizontal scaling.
- way to ID users, using gradio state

charlesfrye commented 3 weeks ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-b80fa95.modal.run

charlesfrye commented 3 weeks ago

Nice work! Will review quickly tomorrow.

charlesfrye commented 3 weeks ago

lookin good!

Screenshot 2024-10-28 at 2 21 31 PM

erik-dunteman commented 3 weeks ago

@charlesfrye I'd like to disable the keep_warm on this, cool if I make that one-line change?

(edit: below commit does this. Ask forgiveness, not permission)

charlesfrye commented 3 weeks ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-1b7a289.modal.run

charlesfrye commented 3 weeks ago

Noticed that we OOMed at ~5 pages of PDF, so added batching with batch size 4. Somewhat surprising to me that the memory allocation is so high! But it's expected for this inference code.

Going past 5 pages also revealed that storing images in a Dict falls apart at a few tens of pages. I moved the image storage onto a Modal Volume.

With those enhancements, the model can now answer questions about my dissertation:

Also made some text edits and added a local_entrypoint for interfacing via the command line, as pictured above.

charlesfrye commented 3 weeks ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-064ef7d.modal.run

modal-labs / modal-examples

Add ColQwen2 example #897

Type of Change

Checklist

Outside contributors