modal-labs / modal-examples

Examples of programs built using Modal
https://modal.com/docs
MIT License
730 stars 175 forks source link

Add ColQwen2 example #897

Closed advay-modal closed 3 weeks ago

advay-modal commented 1 month ago
Screenshot 2024-09-30 at 11 03 15 AM

Adds a chat with RAG example using the following things:

Some things I would do if I wanted people to actually use this in prod. Curious which of these people think is worth doing

Type of Change

Checklist

Outside contributors

You're great! Thanks for your contribution.

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-7a1075b.modal.run

charlesfrye commented 1 month ago

Optimize the cold start (takes 2 mins now on average)

Shorter than 1 min would be nice. Where are we spending time here? Is it loading the models, indexing the PDFs, or something else?

Optimize the inference time for the chat (currently ~10s). Would try vllm, but there's an issue with VLLM and the transformers version that ColQwen2 needs, so I'd have to build vllm from source, which would increase build time?

I would suggest not optimizing inference time unless it's a durable improvement. I suspect vllm will resolve this issue soon, so I'd skip working on it for now.

Try to make the app use less memory (I currently need an 80gb A100, largely because though the underlying model is the same, I couldn't find a clean way to use the same underlying object for the model, and so I end up having 2 model objects)

This might be worth looking into. Getting onto A100-40s (and later L40Ses) would be really nice.

advay-modal commented 1 month ago

Cold start time is coming from loading the models

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-c97a29f.modal.run

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-e52659d.modal.run

advay-modal commented 1 month ago

Have reduced the cold start

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-f06a9a5.modal.run

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-1aa3914.modal.run

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-c2767f3.modal.run

charlesfrye commented 1 month ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-64c2fd0.modal.run

erikbern commented 4 weeks ago

What's the status?

charlesfrye commented 4 weeks ago

What's the status?

@erik-dunteman to pick this one up

charlesfrye commented 3 weeks ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-72f967a.modal.run

charlesfrye commented 3 weeks ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-5e0cb53.modal.run

erik-dunteman commented 3 weeks ago

This one should be ready to go, pending the following:

Changes since I took the PR over:

charlesfrye commented 3 weeks ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-b80fa95.modal.run

charlesfrye commented 3 weeks ago

Nice work! Will review quickly tomorrow.

charlesfrye commented 3 weeks ago

lookin good!

Screenshot 2024-10-28 at 2 21 31 PM

erik-dunteman commented 3 weeks ago

@charlesfrye I'd like to disable the keep_warm on this, cool if I make that one-line change?

(edit: below commit does this. Ask forgiveness, not permission)

charlesfrye commented 3 weeks ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-1b7a289.modal.run

charlesfrye commented 3 weeks ago

Noticed that we OOMed at ~5 pages of PDF, so added batching with batch size 4. Somewhat surprising to me that the memory allocation is so high! But it's expected for this inference code.

Going past 5 pages also revealed that storing images in a Dict falls apart at a few tens of pages. I moved the image storage onto a Modal Volume.

With those enhancements, the model can now answer questions about my dissertation:

Screenshot 2024-10-28 at 10 10 44 PM

Also made some text edits and added a local_entrypoint for interfacing via the command line, as pictured above.

charlesfrye commented 3 weeks ago

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-064ef7d.modal.run