Closed advay-modal closed 3 weeks ago
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-7a1075b.modal.run
Optimize the cold start (takes 2 mins now on average)
Shorter than 1 min would be nice. Where are we spending time here? Is it loading the models, indexing the PDFs, or something else?
Optimize the inference time for the chat (currently ~10s). Would try vllm, but there's an issue with VLLM and the transformers version that ColQwen2 needs, so I'd have to build vllm from source, which would increase build time?
I would suggest not optimizing inference time unless it's a durable improvement. I suspect vllm will resolve this issue soon, so I'd skip working on it for now.
Try to make the app use less memory (I currently need an 80gb A100, largely because though the underlying model is the same, I couldn't find a clean way to use the same underlying object for the model, and so I end up having 2 model objects)
This might be worth looking into. Getting onto A100-40s (and later L40Ses) would be really nice.
Cold start time is coming from loading the models
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-c97a29f.modal.run
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-e52659d.modal.run
Have reduced the cold start
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-f06a9a5.modal.run
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-1aa3914.modal.run
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-c2767f3.modal.run
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-64c2fd0.modal.run
What's the status?
What's the status?
@erik-dunteman to pick this one up
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-72f967a.modal.run
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-5e0cb53.modal.run
This one should be ready to go, pending the following:
Changes since I took the PR over:
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-b80fa95.modal.run
Nice work! Will review quickly tomorrow.
lookin good!
@charlesfrye I'd like to disable the keep_warm on this, cool if I make that one-line change?
(edit: below commit does this. Ask forgiveness, not permission)
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-1b7a289.modal.run
Noticed that we OOMed at ~5 pages of PDF, so added batching with batch size 4. Somewhat surprising to me that the memory allocation is so high! But it's expected for this inference code.
Going past 5 pages also revealed that storing images in a Dict falls apart at a few tens of pages. I moved the image storage onto a Modal Volume.
With those enhancements, the model can now answer questions about my dissertation:
Also made some text edits and added a local_entrypoint
for interfacing via the command line, as pictured above.
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-064ef7d.modal.run
Adds a chat with RAG example using the following things:
Some things I would do if I wanted people to actually use this in prod. Curious which of these people think is worth doing
Optimize the cold start (takes 2 mins now on average)
Optimize the inference time for the chat (currently ~10s). Would try vllm, but there's an issue with VLLM and the
transformers
version thatColQwen2
needs, so I'd have to build vllm from source, which would increase build time?Try to make the app use less memory (I currently need an 80gb A100, largely because though the underlying model is the same, I couldn't find a clean way to use the same underlying object for the model, and so I end up having 2 model objects)
Type of Change
Checklist
lambda-test: false
is added to example frontmatter (---
)modal run
or an alternativecmd
is provided in the example frontmatter (e.g.cmd: ["modal", "deploy"]
)args
are provided in the example frontmatter (e.g.args: ["--prompt", "Formula for room temperature superconductor:"]
latest
python_version
for the base image, if it is used~=x.y.z
or==x.y
version < 1
are pinned to patch version,==0.y.z
Outside contributors
You're great! Thanks for your contribution.