jozu-ai / kitops

Tools for easing the handoff between AI/ML and App/SRE teams.
https://KitOps.ml
Apache License 2.0
266 stars 26 forks source link

PoC: Inference support using Triton #357

Open gorkem opened 2 weeks ago

gorkem commented 2 weeks ago

Describe the problem you're trying to solve

Proof of Concept (PoC) a generic inference container that uses Triton as the inference engine and can download and utilize a ModelKit as efficiently as possible.

Describe the solution you'd like

Describe alternatives you've considered

  1. Baking Artifacts into the Container:

    • Considered baking the artifacts directly into the container, but this approach lacks flexibility and can lead to larger container sizes.
  2. External Model Storage:

    • Using external storage solutions to host the models and mount them at runtime. This adds complexity and potential latency.
  3. On-Demand Model Fetching:

    • Fetching models on-demand during inference requests. This could introduce latency during the initial request.

Additional context