pokt-network / poktroll

The official Shannon upgrade implementation of the Pocket Network Protocol implemented using Rollkit.dev
MIT License
15 stars 6 forks source link

[LocalNet] Add infrastructure to run LLM inference #508

Closed okdas closed 2 months ago

okdas commented 2 months ago

Summary

Adds infrastructure to run and develop against LLM on LocalNet.

Issue

Type of change

Select one or more:

Testing

Documentation changes (only if making doc changes)

Local Testing (only if making code changes)

PR Testing (only if making code changes)

Sanity Checklist

okdas commented 2 months ago

Note: this functionality is behind the gate and is turned off by default to avoid downloading and serving an LLM to preserve resources. Turn on ollama in localnet_config.yaml when needed.

The infrastructure by itself works. Can run the request with curl:

kubectl exec "$(tilt get kd validator -ojsonpath='{.status.pods[0].name}')" -- \
curl -X POST http://ollama:11434/v1/chat/completions -H "Content-Type: application/json" \
    -d '{
        "model": "qwen:0.5b",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

However, it doesn't seem like we support anything but json-rpc at the moment: https://github.com/pokt-network/poktroll/blob/aba098d8e851ed4883c418f6b8c7cf8a22ed760f/pkg/partials/partial.go#L50

I get the following error:

{"level":"error","error":"got: {\n        \"model\": \"qwen:0.5b\",\n        \"messages\": [\n            {\n                \"role\": \"system\",\n                \"content\": \"You are a helpful assistant.\"\n            },\n            {\n                \"role\": \"user\",\n                \"content\": \"Hello!\"\n            }\n        ]\n    }: unrecognised request format in partial payload","service_id":"ollama","message":"failed getting error reply"}
{"level":"error","error":"got: {\n        \"model\": \"qwen:0.5b\",\n        \"messages\": [\n            {\n                \"role\": \"system\",\n                \"content\": \"You are a helpful assistant.\"\n            },\n            {\n                \"role\": \"user\",\n                \"content\": \"Hello!\"\n            }\n        ]\n    }: unrecognised request format in partial payload","message":"failed getting request type"}

I suggest we merge this as is to unblock work on other than json-rpc request types.

Btw, I picked qwen:0.5b as it was one of the smallest recent LLMs. We don't get hardware optimizations in that environment, so it makes sense to use the smallest possible. We can go crazy on DevNet, though.

Olshansk commented 2 months ago

Great find @okdas.

@red-0ne We'll have to prioritize adding support for gRPC, REST and all the other stuff shortly so we're not limited to just json-rpc.

Olshansk commented 2 months ago

@red-0ne With @okdas OOO for the next week, can you update the branch so we can merge it in please?

It'll help unlock development on non json-rpc.

red-0ne commented 2 months ago

@Olshansk , I added ollama services to supplier_stake_configs with a small change to the config parser to support lower/upper case rpc type values.

github-actions[bot] commented 2 months ago

The CI will now also run the e2e tests on devnet, which increases the time it takes to complete all CI checks. If you just created a pull request, you might need to push another commit to produce a container image DevNet can utilize to spin up infrastructure. You can use make trigger_ci to push an empty commit.

Olshansk commented 2 months ago

@red-0ne I added this TODO in the code: # TODO(#511): Add support forRESTand enabled this.

Assuming E2E tests pass, let's merge it in assuming there are no further changes you deem necessary.

Olshansk commented 2 months ago

@red-0ne - @okdas helped me figure out the issue with E2E bugs, which I resolved in [1]. Are you okay with approving this so we can merge it in and iterate on REST later?

[1] https://github.com/pokt-network/protocol-infra/pull/18