pyvespa model integration

vespa-engine / pyvespa

Python API for https://vespa.ai, the open big data serving engine

https://pyvespa.readthedocs.io/

Apache License 2.0

97 stars 30 forks source link

pyvespa model integration #243

Closed brandhsu closed 1 year ago

brandhsu commented 2 years ago

Are there any plans to make it simpler for users to utilize any type of ml model (cnn, rnn, gnn, etc.) they want with Vespa during inference?

pyvespa is a useful tool to integrate ml models with the Vespa engine, but it still feels limited. For example in sequence-classification-task-with-vespa-cloud it is only able to load huggingface text models. Are there any plans in the future to create a wrapper for users to implement their own models with customizability for pre/postprocessing in pyvespa? I say this because having ml developers rewrite pre/postprocessing in java is not a fun experience. Could this also be possible when the model is running inference within the content cluster? I am finding that loading embeddings directly into Vespa is painless but trying to load models into Vespa causes some pains.

Thanks.

References

thigm85 commented 2 years ago

The biggest requirement today to export models to Vespa via a simpler API is the availability of an ONNX converter. The issue with pre/postprocessing is that they are often not included in the ONNX file when we export the pipeline.

For example, the tokenizer part of the text models is not included in the .onnx file when exporting a sequence classification pipeline. This means that we need to implement the tokenizer in Java (faster) or in python (slower). The pyvespa example implements the tokenizer via python. I think this is useful when ML developers are experimenting with Vespa (that is why pyvespa was created) but you would probably need to provide a Java implementation when using your app in production.

Now, back to your question. When you mention wrappers for users to implement their own models, would it be acceptable for the pre/postprocessing to run in python for experimentation, or do you expect those wrappers to somehow be included in the app via a java implementation?

brandhsu commented 2 years ago

Thanks for the response.

I am aware of ONNX only bundling the actual model and not any additional processing needed.

The main questions I have are:

Is it possible to deploy a live model in vespa with python and take advantage of all the optimizations and scaling of the vespa engine. Lot's of recommendation is done with ML nowadays and I think it would be invaluable if developers are able to deploy a model and add their own customizability with the same source code used to train their models.
Is it possible for users to deploy non-huggingface transformer models and preprocessing using the pyvespa api? If not, are there any wrappers that a user could inherit or override that could allow them to define their own custom code and then deploy their models using the pre-existing apis?
Also, have there been any benchmarks comparing live model inferencing on java vs. python over the same search task in vespa.
Is it possible for python code to run on both the container and content clusters for live inference?

brandhsu commented 2 years ago

I think this is the meat of how live model inferencing is done [ vespa/application.py ], please correct me if I am wrong.

Basically, preprocessing is applied outside the vespa endpoint. Is this slower than if the preprocessing was done within the endpoint?

Also if documents were stored in vespa as raw text, could the predict method above be applied to those documents or would a custom java plugin need to be written?

thigm85 commented 2 years ago

Currently, there is no way to run python code in Vespa. Instead, you need to create Java searchers to run code on the container cluster.
There is no way to deploy general python ML code unless it is possible to export it to ONNX.
We could work on wrappers to run more general ML code in pyvespa like pre and post-processing, but this would happen outside Vespa.
Example of general ML processing outside Vespa includes writing pre-processing steps in a PyTorch pipeline before feeding the data to a Vespa app. See how to generate image embedding to use in a text-image app
I am not aware of a Java and python benchmark that compares performance between Java and Python for document and query processing.

Related to the example above, one direction would be to create a built-in data pipeline in pyvespa that would allow users to include their own pre-processing steps.