[STORY] Enable Caikit Python API library

heyselbi commented 9 months ago

Caikit python library - so it can be accessed from a notebook. It would be a wrapper around grpcio/requests for the API. It can be pip installed in the notebook.

Task includes:

Develop an abstraction layer a python client API to access the Caikit core APIs from a notebook. Could we do a REST interface with swagger enablement? KServe APIs already supports swagger.
Look into generating pb2 files during runtime. Is this an option? Is there an impact on inferencing performance?
Create a Caikit Python API documentation that is data scientist persona focused.
Long term: can we make these APIs be compatible with OpenAI APIs and HG TGI APIs?

Related issues:

heyselbi commented 9 months ago

Useful reference: https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/examples/caikit/caikit_grpc_query_example.ipynb cc @guimou

danielezonca commented 9 months ago

Note: the code of the example is still WIP so check with @guimou before doing code changes

guimou commented 9 months ago

Yeah, I have a few changes that I should make by the end of the day. Principally for the channel timeout, and change some parameters to remove anything hard coded.

On Mon., Oct. 2, 2023, 12:56 Daniele Zonca, @.***> wrote:

Note: the code of the example is still WIP so check with @guimou https://github.com/guimou before doing code changes

— Reply to this email directly, view it on GitHub https://github.com/opendatahub-io/caikit/issues/15#issuecomment-1743399902, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6C4YSUX7F3IEFOCUXZR63X5LW3TAVCNFSM6AAAAAA5PUMB66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBTGM4TSOJQGI . You are receiving this because you were mentioned.Message ID: @.***>

guimou commented 9 months ago

Quick note on this "Look into generating pb2 files during runtime. Is this an option? Is there an impact on inferencing performance?". The bad news is that the Python grpc-reflection package does not implement on-the-fly stub generation. It's builtin in Java, available in Go through a third-party package, but nothing in Python. You can only retrieve the protofiles. Of course you could make a shell call to protoc, but that's really dirty, and you'd have to bundle protoc as well, for all architectures. At the moment it's easier to keep the pb2 files...

guimou commented 9 months ago

This is also an interesting avenue. A wrapper around different serving providers to allow direct use of OpenAI API: https://github.com/BerriAI/litellm

krrishdholakia commented 9 months ago

Hey @guimou - i'm the comaintainer of litellm. Happy to help out via PR. What's the problem you're hoping to solve here with litellm?

z103cb commented 8 months ago

@Xaenalt and @heyselbi, @vaibhavjainwiz and I have met discussed how we think we should be tackling this issue (I am also summarising what we discussed on slack).

The library requirements:

Creating a connection, specify HTTP or gRPC, optionally provide a cert to trust (required for gRPC with self-signed certs, there's more we can discuss about potentially fixing that upstream too)
Calling TextGenerationTaskPredict and the streaming version on that http/grpc connection, with inputs like fn_name(text="sometext")
Ensuring that we can pass other keyword args to it, like min_new_tokens, max_new_tokens, etc

The implementation:

The library will be housed in a new repo under the Caikit organisation. Tentatively named Caikit-nlp-client.
In the library we will provide a mechanism to create from .proto files static _pb2.py which will be used to provide the serialisation mechanisms and GRPC client (stub).
The .proto will be generated from executing:

RUNTIME_LIBRARY=caikit_nlp python -m caikit.runtime.dump_services $grpc_interface_dir

The _pb2.py files will be generated from executing the python generation from .proto (the following example command line might not be 100% accurate):

python -m grpc_tools.protoc -I./grpc/ --python_out=. --pyi_out=. --grpc_python_out=. grpc/*.proto

On top of the generated python we will write a client class to provide a simple and straightforward way to make the GRPC calls to the NLP service.
For now we plan on using the generated _pb2.py DTO (request and responses) as the model for the HTTP client. I am not 100% sure that using those objects for the http client would work (a cursory google search, stack overflow would indicate that is possible). I will need to prototype that to make sure it would work.
Provide for some automated tests:
- Spin up the Caikit NLP server on the test machine (no certificates) with at least two simple models running
- Connect to the running server
- Exercise the endpoints, focus of error conditions (un-happy path).

vaibhavjainwiz commented 8 months ago

Implement insecure HTTP client https://github.com/vaibhavjainwiz/caikit-nlp-client/pull/31

dtrifiro commented 8 months ago

Initial implementation (wip): https://github.com/opendatahub-io/caikit-nlp-client/pull/1

z103cb commented 8 months ago

@heyselbi I think we should still keep this open (but I will defer) to your better judgement.

dtrifiro commented 7 months ago

First version (0.0.2) was released on PyPi https://pypi.org/project/caikit-nlp-client/. See https://github.com/opendatahub-io/caikit-nlp-client/releases for releases.

opendatahub-io / caikit

[STORY] Enable Caikit Python API library #15