Closed heyselbi closed 7 months ago
Useful reference: https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/examples/caikit/caikit_grpc_query_example.ipynb cc @guimou
Note: the code of the example is still WIP so check with @guimou before doing code changes
Yeah, I have a few changes that I should make by the end of the day. Principally for the channel timeout, and change some parameters to remove anything hard coded.
On Mon., Oct. 2, 2023, 12:56 Daniele Zonca, @.***> wrote:
Note: the code of the example is still WIP so check with @guimou https://github.com/guimou before doing code changes
— Reply to this email directly, view it on GitHub https://github.com/opendatahub-io/caikit/issues/15#issuecomment-1743399902, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6C4YSUX7F3IEFOCUXZR63X5LW3TAVCNFSM6AAAAAA5PUMB66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBTGM4TSOJQGI . You are receiving this because you were mentioned.Message ID: @.***>
Quick note on this "Look into generating pb2 files during runtime. Is this an option? Is there an impact on inferencing performance?". The bad news is that the Python grpc-reflection package does not implement on-the-fly stub generation. It's builtin in Java, available in Go through a third-party package, but nothing in Python. You can only retrieve the protofiles. Of course you could make a shell call to protoc, but that's really dirty, and you'd have to bundle protoc as well, for all architectures. At the moment it's easier to keep the pb2 files...
This is also an interesting avenue. A wrapper around different serving providers to allow direct use of OpenAI API: https://github.com/BerriAI/litellm
Hey @guimou - i'm the comaintainer of litellm. Happy to help out via PR. What's the problem you're hoping to solve here with litellm?
@Xaenalt and @heyselbi, @vaibhavjainwiz and I have met discussed how we think we should be tackling this issue (I am also summarising what we discussed on slack).
The library requirements:
The implementation:
Caikit-nlp-client
..proto
files static _pb2.py
which will be used to provide the serialisation mechanisms and GRPC client (stub)..proto
will be generated from executing:RUNTIME_LIBRARY=caikit_nlp python -m caikit.runtime.dump_services $grpc_interface_dir
_pb2.py
files will be generated from executing the python generation from .proto
(the following example command line might not be 100% accurate):python -m grpc_tools.protoc -I./grpc/ --python_out=. --pyi_out=. --grpc_python_out=. grpc/*.proto
On top of the generated python we will write a client class to provide a simple and straightforward way to make the GRPC calls to the NLP service.
For now we plan on using the generated _pb2.py
DTO (request and responses) as the model for the HTTP client. I am not 100% sure that using those objects for the http client would work (a cursory google search, stack overflow would indicate that is possible). I will need to prototype that to make sure it would work.
Provide for some automated tests:
Implement insecure HTTP client https://github.com/vaibhavjainwiz/caikit-nlp-client/pull/31
Initial implementation (wip): https://github.com/opendatahub-io/caikit-nlp-client/pull/1
@heyselbi I think we should still keep this open (but I will defer) to your better judgement.
First version (0.0.2) was released on PyPi https://pypi.org/project/caikit-nlp-client/. See https://github.com/opendatahub-io/caikit-nlp-client/releases for releases.
Caikit python library - so it can be accessed from a notebook. It would be a wrapper around grpcio/requests for the API. It can be pip installed in the notebook.
Task includes:
Related issues: