Here is an example http request using only cURL sending a POST request to with a json body.
curl "https://koina.wilhelmlab.org/v2/models/Prosit_2019_intensity/infer" \
--data-raw '
{
"id": "LGGNEQVTR_GAGSSEPVTGLDAK",
"inputs": [
{"name": "peptide_sequences", "shape": [2,1], "datatype": "BYTES", "data": ["LGGNEQVTR","GAGSSEPVTGLDAK"]},
{"name": "collision_energies", "shape": [2,1], "datatype": "FP32", "data": [25,25]},
{"name": "precursor_charges", "shape": [2,1], "datatype": "INT32", "data": [1,2]}
]
}'
For examples of how to access models using python you can check out our OpenAPI documentation .
TODO
Koina depends on docker and nvidia-container-toolkit.
You can find an ansible script that installs all dependencies here.
After installing the dependencies you can pull the docker image and run it. If you have multiple GPUs installed on your server you can choose which one is used by modifying --gpus '"device=0"'
When using this docker image you need to accept the terms in the NVIDIA Deep Learning Container License
docker run \
--gpus '"device=0"' \
--shm-size 8G \
--name koina \
-p 8500-8502:8500-8502 \
-d \
--restart unless-stopped \
ghcr.io/wilhelm-lab/koina:latest
If you want to stay up to date with the latest version of Koina we suggest you also deploy containrrr/watchtower.
docker run \
-d \
--name watchtower \
-v /var/run/docker.sock:/var/run/docker.sock \
--restart unless-stopped \
containrrr/watchtower -i 30 --rolling-restart
.env
with your user- and group-id to avoid file permission issuesdocker compose up -d --wait
docker compose logs -f serving
. It the startup wass successful you will see something like this.:
koina-serving-1 | I0615 13:27:04.260871 90 grpc_server.cc:2450] Started GRPCInferenceService at 0.0.0.0:8500
koina-serving-1 | I0615 13:27:04.261163 90 http_server.cc:3555] Started HTTPService at 0.0.0.0:8501
koina-serving-1 | I0615 13:27:04.303178 90 http_server.cc:185] Started Metrics Service at 0.0.0.0:8502
Further considerations
Dev Containers
and Remote - SSH
extensions.
Using this system you can connect to the server and open the cloned git repo. You will be prompted to reopen the folder in a DevContainer where a lot of useful dependencies are already installed including the dependencies required for testing, linting and styling. Using the dev-container you can lint your code by running lint
, run tests with pytest
and style your code with black .
serving
container by providing the url serving:8501
for http and serving:8501
for gRPC.Triton supports all major machine learning frameworks. The format you need to save your model in depends on the framework used to train your model. For detailed instructions you can check out this documentation. You can find examples for TensorFlow, PyTorch and XGBoost in our model repository.
For storing the model files themselves we use Zenodo. If you want to add your model to the publicly available Koina instances, You should upload your model file to Zenodo and commit a file named .zenodo
containing the download url in place of the real model file.
A major aspect of Koina, is that all models share a common interface making it easier for clients to use all models.
Triton supports models written in pure python. If your model requires pre- and/or post-processing you can implement this as a "standalone" model in python.
There are numerous examples in this repository. One with low complexity you can find here.
If you made changes to your model you need to restart Triton. You can do that with docker compose restart serving
.
The pre- and postprocessing models you just implemented need to be connected to the Ensemble models don't have any code themselves they just manage moving tensors between other models. This is perfect for combining your potentially various pre- and post-processing steps with your main model to create one single model/workflow.
To make sure that your model was implemented correctly and future changes do not make any unforseen changes you can add tests for it in the test
folder. The files added there should match the model name used in the model repository.