nod-ai / shark-ai

SHARK Inference Modeling and Serving
Apache License 2.0
12 stars 25 forks source link

libshortfin CI and Releases #130

Open stellaraccident opened 2 months ago

stellaraccident commented 2 months ago

Bringing up libshortfin CI and releases

Overall Description

libshortfin is a C++ project providing serving oriented APIs atop the IREE runtime. It aims to supercede prior pure-Python systems that used IREE via its low-level synchronous API. From the get-go, libshortfin is:

Internally, it consists of three primary components:

Note that we are transitioning to using the containing repository (currently called sharktank) as a monorepo for a variety of related model development and serving projects. As such, plan for some organization to the way that venvs/dev setup is done across the repo (and CI, etc).

Dependencies

Native deps

libshortfin currently depends purely on the IREE runtime C APIs. In the future, it will also expose an interface to the IREE compiler C API via a delay loaded stub, allowing the compiler to be dynamically loaded (same as some other integrations). The compiler will remain an optional dep and pure-inference or deployed systems can leave it off so long as whatever is built on libshortfin does not need it (i.e. if pipelines are pre-compiled).

libshortfin currently has the following additional native dependencies:

Python deps

shortfin does not have any Python deps for minimal functionality, but it is expected that a number of deps will be useful for various applications of it, and it may provide features based on these deps in an optional fashion that people can use if helpful/available:

Variants

The native _shortfin.lib Python package is provided, similar to iree.runtime, by redirecting at runtime to a concrete native library module based on env flags, allowing the selection of multiple packages runtimes such as "default", "tracy", "assert", etc. As with the iree.runtime, we expect that our normal wheels will bundle the default and tracy versions so that users can always use an instrumented build.

Dependent Projects

It is intended that both C++ and pure/native Python extensions will be built atop libshortfin, providing application level support for various models, etc.

Moving towards release

There are several steps towards a robust release of the project.

Initial Packaging and CI

Initial Build Work

See the current README for developer quality instructions. This just uses find_package for both IREE and other dependencies. A setup.py is added that works for development, but it needs to be completed to build a hermetic/production build.

This will necessitate supporting CMake options to enable bundled/pinned deps vs relying on find_package. The trickiest of these is IREE itself, and for that we should support two modes:

These bundling modes can be developed/tested in pure cmake in isolation. Once functional, setup.py can be extended to invoke CMake itself. See IREE runtime/setup.py for inspiration. The result should be that pip install libshortfin/ works on any reasonably modern Linux and Windows system with no further fuss.

There are a few things that should be done at this point:

Initial CI work

CI should be brought up to:

Note that the project is self contained enough that it should build just fine on free linux and windows runners.

Versioning and Releasing

Releasing libshortfin properly will necessitate working out the versioning and releasing policies for some of its related deps (IREE, iree-turbine, sharktank, etc). This may be the end of the road for the nightly builds of those, and we may want to take this chance to come up with a real versioning policy and apply it uniformly to all of the projects. It would be great to have normal pre-releases and regular releases in stable places where everything can be installed by adding one index-url (vs the current state where everything is somewhere bespoke with all kinds of -f flags).

We also get requests for conda packages from some in the community, and this would be a good time to enable those.

ScottTodd commented 1 month ago

For the package builds, do we want to use manylinux?

Dockerfile choices:

stellaraccident commented 1 month ago

You always want to build python packages that will be run on any other os with a many Linux image. It's typically easier to just do that from the get go. I almost always use one that has been forked locally and extended with needed packages.

ScottTodd commented 1 month ago

Can we yank the old shortfin releases (https://pypi.org/project/shortfin/#history) ? That only has version 0.1.dev3, for Python 3 (including pre-3.12).

If I try running python -m pip install shortfin -f https://github.com/ScottTodd/SHARK-Platform/releases/expanded_assets/dev-wheels from Python 3.10, that gets the stale package, since the new packages are only for Python 3.12 and 3.13.

ScottTodd commented 1 month ago

Can we yank the old shortfin releases (https://pypi.org/project/shortfin/#history) ? That only has version 0.1.dev3, for Python 3 (including pre-3.12).

If I try running python -m pip install shortfin -f https://github.com/ScottTodd/SHARK-Platform/releases/expanded_assets/dev-wheels from Python 3.10, that gets the stale package, since the new packages are only for Python 3.12 and 3.13.

Bah, https://peps.python.org/pep-0440/#handling-of-pre-releases

Pre-releases of any kind, including developmental releases, are implicitly excluded from all version specifiers, unless they are already present on the system, explicitly requested by the user, or if the only available version that satisfies the version specifier is a pre-release.

That old 0.1.dev3 pre-release package for https://pypi.org/project/shortfin/ (and similarly for https://pypi.org/project/sharktank/) is making testing new packaging tricky:

ScottTodd commented 1 month ago

Brain dump before vacation (could merge these into a single tasklist in the original issue comment):

TODO: