Open jngz-es opened 2 years ago
You could look at ONNX Runtime for deploying ONNX models. It has a Java API and is used in production in Java in a number of companies (and also in other search engines). It's designed specifically for inference, and is frequently faster than pytorch. There are a number of configuration options for constraining the execution environment it runs in, and it can exploit GPUs if they are available. We have a wrapper around it in Tribuo so you could maintain uniformity in the interfaces if that's helpful, though going through Tribuo's interfaces would add some overhead for dense data vs using ORT directly from Java.
(Full disclosure: I wrote the initial version of the ONNX Runtime Java API and now maintain it in the ONNX Runtime project)
Thanks Adam for your comments. You are right, actually we already put ONNXRuntime on our plan and we are still evaluating it.
Ok. Let me know if there are issues with the way the Java bits work, we can fix them (though ONNX Runtime is on a quarterly release cycle so new features will have to wait till the next release). For example we've not added support for writing outputs to preallocated user buffers yet and that may be relevant for high throughput applications (on CPUs anyway, there's an unavoidable copy to get any GPU results back).
@jngz-es thanks for putting this proposal together. I'm very excited about this work. I have some questions/comments:
@kgcreative would love your thoughts ^^^
Thanks for your comments @elfisher . Add my thoughts here.
- Are we planning to extend the REST API to support uploading without cluster restarts?
- What are the different methods we want to expose?
Yes. We plan to build several new APIs to support custom model:
upload model
API to upload their own model to OpenSearch (will split the model into smaller chunks and save to ML model index). load model
API to load model into memory. inference
API. unload model
API to remove model from memory.
- If we are doing the REST API, I'd suggest we explore building a UI in dashboards to facilitate the upload/management.
That's a good suggestion. And we do plan to build some UI to support model management/uploading. But considering our resources, we may plan tasks in phases. How about we prioritize the REST API first, then UI if no enough resources to work in parallel?
- How does this fit in with something like using Hugging Face's Pytorch Transformers?
Good point. Huggingface model is considered. If we don't have enough resources, we plan to support some Huggingface model first like some NLP models. We can extend the scope to support more models later. What do you think?
@jngz-es - thanks for putting this together!
Some questions for me as well:
Is there a reason for this to be implemented in Java?
Why not dedicate a node for inference and have it implemented in Python. It can be an internal node so users call the OpenSearch APIs through master nodes which delegate requests to inference nodes just like search requests are delegated to data nodes. Having an inference python node will broaden the spectrum of ML/DL capabilities to include.
I saw this is moved to 2.4 on the roadmap. I'm updating the label accordingly and adding the roadmap label
Feature Review check-in notes: The team will work with the docs team to create the corresponding doc issue, and document the type of model that will be supported. Additionally, feedback was given to enable GPU acceleration for deep learning. The UX action item is to work on user-flow requirements in parallel while the API work is occurring.
just to check, have you guy laid out the possibility of using something like https://www.deepdetect.com/ ? it has a few tutorial on how to integrate deep detect outputs directly to elastic.
just to check, have you guy laid out the possibility of using something like https://www.deepdetect.com/ ? it has a few tutorial on how to integrate deep detect outputs directly to elastic.
@diegodorgam Not familiar with deepdetect. Can you share the tutorial link? If possible, that will best if you can summarize how deep detect integrate with elastic.
Is there a reason for this to be implemented in Java?
Why not dedicate a node for inference and have it implemented in Python. It can be an internal node so users call the OpenSearch APIs through master nodes which delegate requests to inference nodes just like search requests are delegated to data nodes. Having an inference python node will broaden the spectrum of ML/DL capabilities to include.
@asfoorial Very sorry that missed your comment, good suggestion. We thought about this option. We find it challenging to package all these python dependencies and build roburst/safe/scalable solution for all platforms. Maybe it's not so hard to do such customization work for one user manually. But seems not so easy to build easy solution that user can just install and run just like what OpenSearch does now. But welcome any detailed solution for this option.
@ylwu-amzn have you considered embedding Python executing within JVM using https://github.com/ninia/jep?
@ylwu-amzn have you considered embedding Python executing within JVM using https://github.com/ninia/jep?
Yes, @jngz-es did some research on jep before, it has some compatibility issue with some CPython extensions which may cause JVM crash. So we didn't choose jep.
Yes, just like @ylwu-amzn mentioned, we did some testing on jep. From our side, it brought some unavailability issues.
I think the focus should be directed towards universal runtimes like ONNX which support a large range of other libraries such as TensorFlow and PyTorch. This also enables indirect support for other derivative libraries like HuggingFace Transformers without explicitly writing code for them.
With that being said, sorry I am out of the loop but are there any discussions / progress on the Feature Engineering aspect of Machine Learning? Since ML models expect the data to be in a certain format, there should be in-built transformations that help to transform the data (tokenization, tokens to embeddings, standardization, custom functions?) suitable for the selected model. The case for custom development of select libraries can be made here, such as HuggingFace Transformers, that provide "processors" or "tokenizers" that help in transformation of input into the desired format.
I think I am reiterating some of these from the RFC in #123 (changed my github handle, same guy).
hi, @ashim-mahara , thanks a lot for your suggestion. We have released model serving framework as experimental feature in 2.4, check this doc https://opensearch.org/docs/latest/ml-commons-plugin/model-serving-framework/
In 2.4, we support text embedding model and we support the standard huggingface tokenizer. You can find example model in our code(link). Unzip the example model, you can find it contains model torchscript file and tokenzier.json
file. The tokenizer.json
file is from Huggingface, which defines tokenization logic.
Will the tokenzier.json
file meet your requirements? Feel free to share your use cases and add your suggestion here
Can the ML Commons Plugin be used for Getting Better Search Results or other tasks only ?
@hamzashabbir11 you can use the neural-search plugin, which internally uses ML Common models, to get better search results
Okay, Thanks for your answer @asfoorial . I am a beginner in OpenSearch. I am using it to get better search results for an E-commerce app. How can I implement neural search and are there any resources you can share?
@hamzashabbir11 it is not very straightforward, but you can start here https://opensearch.org/docs/latest/neural-search-plugin/index/
hi, all, thanks for your valuable feedback and suggestions.
From 2.5, ml-commons supports running model on GPU, refer to this doc https://opensearch.org/docs/latest/ml-commons-plugin/gpu-acceleration/ or https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/GPU_support.md
From 2.5, we support ONNX model too. You can find text embedding examples (by 2.5, we only support text embedding models) on this doc https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/text_embedding_model_examples.md
After model loaded and running in model serving framework, you can use neural search plugin to do semantic search, refer to https://opensearch.org/docs/latest/neural-search-plugin/index/
Note: The model serving framework and neural search still experimental now. Welcome to try and provide your feedback. Let's build a better product together!
It seems that the future version of OpenSearch plans to provide the function of MLOps. Does it right? Now CRUD for the model has been implemented. However, there are still many problems in model management:
@hexbo Thanks for your suggestions, really great points!
We do have plan to support MLOps. Is it something valuable to you? Appreciate if you can share more details and your suggestions for this. Like what function you need, what pain point do you have now etc.
For supporting more model formats and inputs suggestion, yes, we will support more. Do you have some priority list for these? Like which model format/input type we should support first to support your use case?
@ylwu-amzn Yeah, I always use Tensorflow to develop the DL model. At present, the market share of TensorFlow is also high, I hope that OpenSearch can support the model file of TensorFlow in the recent version update.
Thanks @hexbo for sharing this. BTW, can you export your Tensorflow model to ONNX? From 2.5 release, we support ONNX. We only support text embedding model now (by 2.5). Do you have requirements for other different model types? If yes, prefer to learn how you are going to use it in OpenSearch.
Yeah,tf2onnx can do this. But tf2onnx can't automatically convert my model.
What / Why
What are we building?
The ML-Commons plugin allows user to upload deep learning models and use them to make inference.
Why are we building it?
In many domains like NLP, computer vision etc. the deep learning algorithms overwhelm the traditional machine learning algorithms. For OpenSearch customers who are planning to use machine leaning algorithms or using some traditional machine learning algorithms for their bussiness, deep learning is definitely a good way to improve their systems.
How do we know?
What matters most to OpenSearch?
Bring deep learning models into OpenSearch. Deep learning is increasingly affecting our lives. More and more customers want to use deep learning technology to improve their systems. This feature brings an opportunity for customers to apply deep learning models in OpenSearch for their business.
What does the customer experience look like?
High-level user stories:
What are the risks and assumptions?
The format of model
There are pretty many deep learning frameworks to train models, like PyTroch, TensorFlow, MXNet etc. They are not compatible for the model’s format. We could start with TorchScript and ONNX as PyTorch is a very popular deep learning framework and ONNX is a standard format across various deep learning frameworks.
The computation cost
We have the limitation on the size of model. Supporting distributed inference or not need to be considered.
Dedicated ML node with GPU
If no dedicated node, we should have more restrict limitation on resources for the models.
ARM support
Support like AWS Graviton.