As we plan to support more LLM services, we want to make sure that our LLM service implementation is solid and can support different APIs.
Currently, the code which uses ollama for local LLMs (and CF as well) tries to "bridge" each request, and because of that, it needs part of the APIs to be re-implemented - see for instance: https://github.com/masa-finance/masa-oracle/blob/test/pkg/api/handlers_data.go#L692 where we unmarshal the request of the user and send it back to the workers. Same applies for other services, which are basically "bridging" by re-implementing part of the APIs.
This is sub-optimal for two reasons mainly:
we have to re-implement the API
Any user to use the API will need to follow our API schema spec, instead of the service (which are they used to use)
This card is about checking the feasibility of proxying the request instead - and to have the node to act "dummy" - without having to know the API specs. The scope is for the Local LLM services for now, but as an output of this card we should have generic code that can be re-used for other services too.
[ ] We don't implement anymore rest API of the LLM backends that we support
[ ] The node acts as a proxy between the TCP service and the worker nodes
[ ] We have a generic package handling proxying - the logic is generic for TCP connections, as such can be extended to other services that could benefit from the same approach, and avoid to re-implement APIs in the first place
Problem
As we plan to support more LLM services, we want to make sure that our LLM service implementation is solid and can support different APIs.
Currently, the code which uses ollama for local LLMs (and CF as well) tries to "bridge" each request, and because of that, it needs part of the APIs to be re-implemented - see for instance: https://github.com/masa-finance/masa-oracle/blob/test/pkg/api/handlers_data.go#L692 where we unmarshal the request of the user and send it back to the workers. Same applies for other services, which are basically "bridging" by re-implementing part of the APIs.
This is sub-optimal for two reasons mainly:
This card is about checking the feasibility of proxying the request instead - and to have the node to act "dummy" - without having to know the API specs. The scope is for the Local LLM services for now, but as an output of this card we should have generic code that can be re-used for other services too.
Additional context
if unsure, a baseline can be EdgeVPN - it already does this by proxying TCP connections over the libp2p network. See : https://github.com/mudler/edgevpn/blob/master/cmd/service.go
Acceptance criteria