zeno-ml / zeno

AI Data Management & Evaluation Platform
https://zenoml.com
MIT License
214 stars 11 forks source link

Complex model returns beyond output #629

Closed xnought closed 1 year ago

xnought commented 1 year ago

Zeno is cool because you get everything for free (user interfaces). All you need to do is define the decorators and config.

Another way to compare models is speed. If a model does worse, but is 1000x faster, I might consider using that worse performing model.

Since we do predictions in the backend on the data with the model decorator, why not time it? And somehow this data could be used in the report view.

Should we have a speed metric?

xnought commented 1 year ago

@cabreraalex what do you think?

cabreraalex commented 1 year ago

oh this is awesome. And a very big issue for models in deployment.

I think more generally we want to be able to return more than just the output from the @model functions. I'm not sure what the best way to do that is. Right now we have (output, embedding). We could add a third dictionary that are new metadata columns/distributions? seems a bit messy though. Or we replace the second option with a dictionary and make the embedding on of the entries??

xnought commented 1 year ago

That's a great point of discussion.

At that point why not make the entire return a dictionary so the return value is completely readable?

return_value = {
        "predictions": [],
        "embeddings": [],

         # then other stuff they want where the key is the new column name
         "probabilities_per_prediction": [],
         "timing": []
}

or you could make the user build up the return value in a class called ModelBatch

from zeno import ModelBatch

return_value = ModelBatch().savePredictions(preds)
            .saveEmbeddings(embed)
            .saveOther("times", times_milliseconds)
            .saveOther("probs", probabilities)

and have the user return that

the ModelBatch could alternatively be given as another parameter in the decorated function and the user just saves stuff to that and returns it

cabreraalex commented 1 year ago

Oof that's a lot of complexity... we could maybe make it both options? EITHER you just return a model output OR you do something more complicated like the dictionary return or the object builder pattern?

xnought commented 1 year ago

Fair point on complexity. I think perhaps your idea on (output, embedding, dictionary) might be best then

cabreraalex commented 1 year ago

I think that's even more complex maybe. I think we either A. Send it and make it a return object like you proposed B. Keep it backwards-compatible and either allow simple 1-return outputs OR an object

xnought commented 1 year ago

I'm struck with indecision here. Perhaps this could be a group discussion point on Monday

xnought commented 1 year ago

After reflection, I like option B.

But I would keep the optional embeddings for backwards compatibility.

So they can return a just predictions, or a tuple (predictions, embeddings) where embeddings is optional (like we have it now), or they just return the ModelBatch thing like I had above.