franciscojavierarceo commented 1 month ago

Issue

Kubeflow should provide some guidance on serving the following two types of model predictions online:

Precomputed Predictions
Cached Predictions

(1) requires retrieving a precomputed score from an online database. (2) requires recomputing the score dynamically (e.g., calling a Kserve endpoint), retrieving the precomputed score from an online database, and updating the score in some way (e.g,. when data changes from the upstream data producer).

Definitions

Precomputed Predictions: We define precomputed predictions as a sample of n observations (e.g., n users) computed as a batch process. Example: a risk-score prediction computed for all n users at a point in time and stored in some file (e.g., parquet/csv).

Cached Predictions: We define cached predictions as predictions computed online (i.e., via a request), cached in a database, and updated and persisted on some asynchronous frequency independent of the usage of the prediction. Example: a risk-score prediction computed for a single user and stored in an online database and refreshed as features in the model change independent of the usage of the risk-score (i.e., a client's API call).

Options Available

I believe there are at least 3 ways this can be done:

KServe orchestrates a call to Feast and a KServe endpoint
- This is how the KServe/Feast demo operates today but it's not meant for Batch Models
  - Modifying it to support Batch Models is straight forward but it'd basically just be a call to Feast
Feast orchestrates a call to a KServe endpoint
- I have created a small demo of how this could be on in Feast using an On Demand Feature View (instead of an ODFV, we could call an endpoint in KServe) and this would satisfy both (1) and (2)
Create a new library to handle orchestrating the calls between Feast and KServe
- This would be a light-weight library that would be similar in spirit to my demo but outside of Feast

There are pros and cons to each and it'd be good to discuss them with the Kubeflow community to work through them to come to a consensus.

Feedback from the Community

I think the solution recommended may end up depending upon the needs of the users. Having KServe call Feast requires an additional network call but it is a more intuitive architecture. Getting feedback from the community would be great here to discuss the tradeoffs and make an informed, collaborative choice. An ideal outcome would be to incorporate this feedback in the Kubeflow documentation.

Additional Context

There are several discussion that have lead me to make this issue see this issue in Feast here, this issue in Kubeflow/kubeflow and this blog post by Databricks.

terrytangyuan commented 1 month ago

Could you clarify what you meant by "batch models" and "cached predictions"?

franciscojavierarceo commented 1 month ago

@terrytangyuan added some definitions, let me know if you'd like additional context or info! Happy to try to make this clearer. 👍

franciscojavierarceo commented 1 month ago

One idea from @thesuperzapper is that we could make a blog post instead about this, which I like as an option before doing an implementation.

@thesuperzapper if you're aligned with this, I can take this on as a starting point. I'd limit the scope to just serving batch predictions as a feature. What do you think?

thesuperzapper commented 1 month ago

One idea from @thesuperzapper is that we could make a blog post instead about this, which I like as an option before doing an implementation.

@thesuperzapper if you're aligned with this, I can take this on as a starting point. I'd limit the scope to just serving batch predictions as a feature. What do you think?

@franciscojavierarceo I'm still not entirely sure what you would be needing from the Kubeflow community here.

What would your blog post be explaining or demonstrating specifically?

In general, I am very on board with having a good case study for batch feature serving with Kubeflow. Although I am not quite sure I understand how KServe comes into it, as KServe is about serving REST endpoints, which are not typically associated with batch inference.

franciscojavierarceo commented 1 month ago

@franciscojavierarceo I'm still not entirely sure what you would be needing from the Kubeflow community here.

I'm asking if folks would be supportive of me creating a demo or documentation outlining how users can serve batch computed predictions* using Feast, if not then I won't spend the time building the demo as it will require meaningful effort. I would want to add it to the website to highlight the behavior.

What would your blog post be explaining or demonstrating specifically?

*Concretely, if a MLE wanted to serve predictions online created by an ML Model that was the output of some scheduled job (e.g., a KFP pipeline), you could do so using Feast (in this case the prediction is a feature).

In general, I am very on board with having a good case study for batch feature serving with Kubeflow. Although I am not quite sure I understand how KServe comes into it, as KServe is about serving REST endpoints, which are not typically associated with batch inference.

Awesome, glad we're aligned. Yeah, KServe does not come into the batch-only use case.

KServe does come into it when you want to calculate a score in real-time, cache it, update it only when the data changes, and initialize the cache from a batch set of data. This use case is a little complicated but will offer much lower latency for serving ML Models, which is why I think it's useful.

Also, thanks for reviewing! Let me know if you have additional feedback. 👍

franciscojavierarceo commented 1 month ago

@andreyvelich let me know if you have any thoughts here.

andreyvelich commented 1 month ago

Thank you for this @franciscojavierarceo! I would love to hear some thoughts from KServe folks if there is something that their users are interested it (cc @yuzisun @sivanantha321 @johnugeorge). Especially, if we have any gaps with Feature serving using Feast + KServe.

If we feel that we should just explain the case-study on how to achieve it, we should create a Kubeflow blog post about it as @thesuperzapper suggested: https://blog.kubeflow.org/.

franciscojavierarceo commented 2 weeks ago

@andreyvelich @yuzisun @sivanantha321 @johnugeorge @thesuperzapper and @terrytangyuan I updated the Feast documentation to outline how this can work in Feast here: https://docs.feast.dev/v/master/getting-started/architecture/model-inference.

This doesn't mention KServe explicitly as it's meant to be from the Feast perspective (i.e., inference approach agnostic). It would ideal to incorporate similar documentation to Kubeflow to outline the tradeoffs from different structures (e.g., KServe centric-client or a completely separate client orchestrating both KServe and Feast).

kubeflow / community

Support Batch and Cached Predictions #732

Issue

Definitions

Options Available

Feedback from the Community

Additional Context