Initial discussion - Githubissues

inejc commented 5 years ago

This issue is a continuation of this Reddit question and serves as an initial discussion regarding design choices and collaboration on this project.

High-level requirements of the project:

the ability to expose trained doddle-models as a REST API
the ability to serve multiple models and multiple versions of the same model simultaneously (A/B testing different models/versions)
a mechanism to group multiple inference examples into a batch before they are sent to the actual model (similar to the groupedWithin operator and this Apache Beam trigger)
a CLI for model deployments

Technology Candidates

Resources:

Tensorflow Serving (a good example of what we are trying to port to Scala)

inejc commented 5 years ago

The first step should be to decide on the tech stack. I should say that I don't have any experience with fs2 or monix and very limited with akka, so I'm very interested in hearing opinions re that.

SentinelCyberSecurityMunich commented 5 years ago

👍 Sounds good i worked with akka streams and graphs so in that I can offer help would also be interessted in the latest technologies regarding ai like tensor flow so thank you

h0ffmann commented 5 years ago

I'm interested in contribute. I already worked just with Akka, but i'm open to use new libraries, in special if they use more FP concepts like monix or fs2.

They doesn't have scala examples yet but is the same objective. I think we can catch same good ideas.

ashwinbhaskar commented 5 years ago

I would like to the reasons for the choices suggested. If it's a simple rest API, then we don't need streaming. What use cases (in detail) are you expecting the REST API to solve?

inejc commented 5 years ago

Sorry for the delay, I'm AFK for a few days. I'll respond with detailed examples on Monday.

brurucy commented 5 years ago

I vote for akka.

inejc commented 5 years ago

I'll try to capture the requirements by presenting a scenario in which we want to productionize a simple predictive model.

1. Model Example

Let's assume we want to serve a model that will help doctors predict the weight of a person based on their height, gender, and age. The CSV below could be a dataset used to fit such model:

height,is_female,age,weight
179,0,27,83
170,1,24,61
185,0,23,88
190,0,21,94
164,1,23,55

Once we fit the model on the data above, we can use it to make predictions for new examples without the weight information, e.g.:

height,is_female,age
171,0,25
169,1,23
183,0,23
192,0,19
158,1,29

In other words, we can think of the fitted model as a function that maps height, gender, and age to weight: weight = f([height, is_female, age]). See the fit and predict methods in the predictor typeclass to see how this is exposed in doddle-model; in our case height, gender, and age are called features and weight is called a target.

See this to realize that Features is in fact a matrix and Target a vector of Double. This kind of API allows for fitting and predicting on multiple instances (rows in the CSV files) at once. If we take a look at the first CSV file again (the one used to fit the model), we can see that all columns except the last one constitute the features matrix, i.e.:

179,0,27
170,1,24
185,0,23
190,0,21
164,1,23

and from the last column we construct the target vector, i.e.:

The main reason for putting multiple examples together before calling fit and predict is that in most cases, because of the nature of the underlying implementation, it is much faster to do the calculations that way as opposed to doing them on each row separately in a loop (i.e. having predict exposed as predict(model: A, x: DenseVector[Double]), where x is each of the rows from the second CSV file).

2. Model Serving

Now let's assume we want to expose the fitted (trained) model for hundreds of hospitals, meaning that thousands of different doctors will be using it every hour. Essentially, each of them will be calling a REST endpoint (GET /model/<model-name>/predict/) for a single patient that they are currently dealing with (a single vector constructed from height, gender, and age measurements). JSON request data might look something like:

{
  "instances": [
    [171,0,25]
  ]
}

In order for us to be able to serve predictions (weight) as efficiently as possible, we need to put vectors from different doctors/patients into a matrix (as rows) and only then call the predict function on the model.

Notice that serving (generating predictions) is completely stateless and can thus easily be scaled horizontally, i.e. we can serve the same model on multiple cores/machines.

Additionally, we sometimes need to serve the same model trained on different datasets (or even with different features; e.g. we add a measurement related to body fat) and then compare their live performance with A/B tests to figure out which version works better in practice. We should thus be able to specify the version when calling the REST endpoint as well (GET /model/<model-name>/version/<version-identifier>/predict/).

inejc commented 5 years ago

Following the example from the previous comment, some of the doctors could collect measurements from multiple patients and send more instances at once, e.g. JSON request data might look like::

{
  "instances": [
    [171,0,25],
    [169,1,23],
    [183,0,23]
  ]
}

Despite that, the serving system should still batch instances from multiple requests together. In cases where models would be served in environments with enough memory, larger batch sizes should increase performance in high-load scenarios. Batch size and maximum latency should be configurable, similar to the groupedWithin operator from Akka Streams.

inejc commented 5 years ago

3. Model Deployment

It should also be possible to deploy new models and versions without changing any code, i.e. via a REST API. E.g. by calling something like POST /model/create/ or POST /model/<model-name>/version/create/.

We could load serialized models from Google Cloud Storage buckets and from other locations (whatever storage we decide to support).

I think a central type for the things that could be "served" should be something like Servable (similar to how Tensorflow Serving is designed). We would then provide implementations for doddle-models, but it should also be possible for users to provide their own implementations for custom servables. E.g. a servable that uses a trained neural network (see tensorflow_scala) to generate features from raw text (embeddings) and then a trained doddle-model that makes predictions based on the embeddings.

I hope this makes things a bit clearer. Let me know if anything above doesn't make sense or more info is required. In any case, I'm looking forward to continuing this discussion.

ashwinbhaskar commented 5 years ago

Can I join this project?

inejc commented 5 years ago

@ashwinbhaskar of course, there was no work done on that yet.

ashwinbhaskar commented 5 years ago

Is there any specific task that you want me to do?

inejc commented 5 years ago

I think the first "task" is to design the architecture of the library based on the requirements above (if anything is unclear, I'm happy to go into more details) and then think about the roadmap and the actual tasks based on that. We could do that in the Wiki section of this project, i.e. here.

ashwinbhaskar commented 5 years ago

@inejc Sorry for the late reply. I read through your examples of Model Example, Model Serving and Model Deployment. They are solid. Just to check if my understanding is correct :- when you spoke about exposing GET /model/<model-name>/version/<version-identifier>/predict/ you want this api to return you the targets right?

ashwinbhaskar commented 5 years ago

@inejc Given my understanding is correct, we should start with designing and finalising on the API contract and work our way down towards the core. What do you think?

inejc commented 5 years ago

@ashwinbhaskar no worries, this is a potential hobby project so there shouldn't really be any deadlines 🙂.

You are right, GET /model/<model-name>/version/<version-identifier>/predict/ must return the target (a.k.a the predicted weight).

Agreeing on the API first seems very reasonable to me. Should we write a Wiki entry with what endpoints we want to have and the payload format?

ashwinbhaskar commented 5 years ago

@inejc hmmm..yes, let's write a wiki entry! Can you start with writing a skeleton. I will add on to it.

ashwinbhaskar commented 5 years ago

@inejc Or do you want me to start??

inejc commented 5 years ago

@ashwinbhaskar feel free to start the document 👍. If you want to wait, I should be able to help by the end of the week.

ashwinbhaskar commented 5 years ago

@inejc Sure, I will give it a go. But before that I wanted to discuss on the fit method here. The fit method takes in a model as an argument and returns a model of the same type. What is the purpose of the model being passed as argument?

inejc commented 5 years ago

@ashwinbhaskar estimators in doddle-model are implemented using typeclasses. If you are not familiar with them, you can take a look at:

You can find the basic doddle-model typeclasses here. For example, if one wants to add a new classification algorithm, an instance of the Classifier typeclass needs to be implemented as evidence that a particular (case) class provides the promised functionality. Here is an example of how this is done for the most frequent (dummy) classifier.

Don't hesitate to ask if you need any more help.

ashwinbhaskar commented 5 years ago

@inejc hmmm..I am aware of type classes. I guess I still didn't understand what a model exactly is. My understanding was that a model can be described using features and corresponding targets. Once this model is formed, you can query for targets by giving features as an input to the model [predictor method].

Is my understanding correct? If it is, then what is the need to introduce model as an argument to the fit method? Shouldn't fit construct and return a model when given features and targets?

inejc commented 5 years ago

@ashwinbhaskar model is described with some model-specific configuration (like lambda here). Once such a model is constructed, it is deemed untrained or unfitted, hence the model exists without seeing training features and targets before; it is in a sense quite useless (it can't be used for prediction yet) but it is still a model that describes part of a future, trained model (with configuration like lambda--this configuration is called hyperparameter in ML literature and is not changed during training--and some internal parameters--regular parameters if you want--that will be changed during training). Once we show features and targets to the untrained model, we can map it to a trained model by changing the beforementioned internal parameters. Hope that clears things a bit. If not, let me know and I will do my best to provide a better explanation.

You are correct in saying that once the model is trained (it was shown features and targets) one can query it by giving some new features and it will return the predicted target.

ashwinbhaskar commented 5 years ago

@inejc ah..okay. I kind of understood at a zoomed-out level. Thank you for taking the time to explain 😄 I will look into configuration like lambda and try to understand its significance. Any reference or link that explains it in a concise way?

ashwinbhaskar commented 5 years ago

@inejc Also, with dotty coming up you don't have to expose typeclasses this way. You can use extension method when defining typeclasses. Me and a friend have come up with scala-to-dotty repo. Though it is still under development, we have added a few things like dotty way of implementing typeclasses and implicits [using delegates].

inejc commented 5 years ago

Two short references that explain the difference between parameters and hyperparameters that together constitute the state of the model:

Thanks for pointing to the dotty example 🙂. I yet need to do more in-depth reading regarding dotty changes so examples will be useful. I still need to make the project compile for 2.13 first though (https://github.com/picnicml/doddle-model/pull/71).

ashwinbhaskar commented 5 years ago

@inejc hmmm..Let me know if you would like me to contribute by fixing any issues or anything else. Happy to help 😄

inejc commented 5 years ago

@ashwinbhaskar that would be awesome if you are interested; any kind of contributions are much appreciated 😄. You should be able to find some open issues. Let me know if you need any help.

ashwinbhaskar commented 5 years ago

@inejc I have added api contract for predict here Can you have a look at it please?

ashwinbhaskar commented 5 years ago

@inejc hey, did you get a chance to go through the API contract?

picnicml / doddle-serving

Initial discussion #1

High-level requirements of the project:

Technology Candidates

Resources:

1. Model Example

2. Model Serving

3. Model Deployment