Open stbaione opened 3 weeks ago
Hang tight for a bit. More tooling is coming that will make this all one command. Building it out for sdxl first.
https://github.com/iree-org/iree/pull/18630#pullrequestreview-2409072569
Ah yes, I was just going to connect those dots too.
For SDXL there are multiple submodels (VAE + UNet + CLIP), so having the build system manage all of them is especially helpful. Ideally we can standardize on a similar set of APIs for llama, SDXL, and future supported models.
Closing as something is already in the works for this
Well we still need code written. Fine to keep this as a tracking issue, blocked on the work happening for SDXL.
Looks like iree.build is merged!
Discussion for this in #373 and #284.
The export script in sharktank was built specifically for llama 3.1 models and has some rough edges. Along with this, it requires users to chain together cli commands:
python -m sharktank.examples.export_paged_llm_v1.py [--options]
, theniree-compile [--options]
.It has some rough edges, is a bit cumbersome from a user perspective, and requires CI runs to invoke cli commands via
subprocess
, instead of having a programmatic in-memory alternative.We should find a more general and easier to use solution to handle generating mlir for LLM models and compiling those models to
.vmfb
for shortfin server.Below is a starting point recommendation provided by @ScottTodd:
"Users shouldn't need to chain together
python -m sharktank.examples
. andiree-compile ...
commands. We can aim for something like https://docs.vllm.ai/en/latest/getting_started/quickstart.html#offline-batched-inference(that's as minimal as it gets - we'll want to pass options like the compilation target though)"