Closed brandhsu closed 1 year ago
The biggest requirement today to export models to Vespa via a simpler API is the availability of an ONNX converter. The issue with pre/postprocessing is that they are often not included in the ONNX file when we export the pipeline.
For example, the tokenizer part of the text models is not included in the .onnx file when exporting a sequence classification pipeline. This means that we need to implement the tokenizer in Java (faster) or in python (slower). The pyvespa example implements the tokenizer via python. I think this is useful when ML developers are experimenting with Vespa (that is why pyvespa was created) but you would probably need to provide a Java implementation when using your app in production.
Now, back to your question. When you mention wrappers for users to implement their own models, would it be acceptable for the pre/postprocessing to run in python for experimentation, or do you expect those wrappers to somehow be included in the app via a java implementation?
Thanks for the response.
I am aware of ONNX only bundling the actual model and not any additional processing needed.
The main questions I have are:
I think this is the meat of how live model inferencing is done [ vespa/application.py ], please correct me if I am wrong.
Basically, preprocessing is applied outside the vespa endpoint. Is this slower than if the preprocessing was done within the endpoint?
Also if documents were stored in vespa as raw text, could the predict
method above be applied to those documents or would a custom java plugin need to be written?
Related to the example above, one direction would be to create a built-in data pipeline in pyvespa that would allow users to include their own pre-processing steps.
Are there any plans to make it simpler for users to utilize any type of ml model (cnn, rnn, gnn, etc.) they want with Vespa during inference?
pyvespa
is a useful tool to integrate ml models with the Vespa engine, but it still feels limited. For example insequence-classification-task-with-vespa-cloud
it is only able to load huggingface text models. Are there any plans in the future to create a wrapper for users to implement their own models with customizability for pre/postprocessing inpyvespa
? I say this because having ml developers rewrite pre/postprocessing in java is not a fun experience. Could this also be possible when the model is running inference within the content cluster? I am finding that loading embeddings directly into Vespa is painless but trying to load models into Vespa causes some pains.Thanks.
References