k3s-io / kine

Run Kubernetes on MySQL, Postgres, sqlite, dqlite, not etcd.
Apache License 2.0
1.56k stars 233 forks source link

nats: stream not found #131

Open bbigras opened 2 years ago

bbigras commented 2 years ago

I'm getting this. k3s/kine/nats have been running for about 20 minutes.

{"level":"warn","ts":"2022-08-15T18:06:31.382Z","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}

docker-compose.yaml

version: '3.6'
services:
  nats:
    # image: nats
    image: nats:2.8.2
    command: >-
      -js
      -DV
  kine:
    image: rancher/kine:v0.9.3-amd64
    restart: on-failure
    depends_on:
      - nats
    ports:
      - 2379:2379
    command: >-
      --endpoint "nats://nats"
      --debug

run k3s

docker run \
  -it --rm \
  --tmpfs /run \
  --tmpfs /var/run \
  --net=host \
  --privileged rancher/k3s \
  server \
  --cluster-init \
  --disable-agent \
  --token=SECRET \
  --datastore-endpoint=http://127.0.0.1:2379

log

[...]
{"level":"warn","ts":"2022-08-15T18:07:15.201Z","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:07:27.259776      39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:07:27.752178      39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
{"level":"warn","ts":"2022-08-15T18:07:49.495Z","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:07:57.279814      39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:07:57.763788      39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
{"level":"warn","ts":"2022-08-15T18:08:23.986Z","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:08:27.298772      39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:08:27.775153      39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]

Do I need to create a nats stream manually?

brandond commented 2 years ago

No, I don't believe you should have to create the stream. Do you perhaps need to specify the port in addition to the container hostname in the nats:// URI? What are the logs from the kine container? Do you get the same error when testing with nats:2.7.4?

bbigras commented 2 years ago

Using nats://nats:4222 seems to do the same.

Same with nats:2.7.4.

kine.log

brandond commented 2 years ago

That all looks fine - it seems that kine is functioning properly. Its possible that the "stream not found" error is an improperly wrapped return code from nats? I'd have to defer to the nats team, as they contributed the code.

The unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request error is most frequently seen when the metrics-server pod has not yet started. You might investigate why metrics-server isn't working, or start k3s with --disable=metrics-server

bbigras commented 2 years ago

Weird, I still get this, with --disable=metrics-server.

W0815 19:33:12.047943      38 handler_proxy.go:102] no RequestInfo found in the context
E0815 19:33:12.047976      38 controller.go:113] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: Error, could not get list of group versions for APIService
I0815 19:33:12.047986      38 controller.go:126] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W0815 19:33:12.048964      38 handler_proxy.go:102] no RequestInfo found in the context
E0815 19:33:12.049015      38 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0815 19:33:12.049037      38 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0815 19:33:13.437910      38 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 19:33:13.850875      38 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
brandond commented 2 years ago

It looks like it's still there. Make sure you're wiping the nats datastore, otherwise you'll have stuff left over from the previous cluster instance.

bbigras commented 2 years ago

done.

It looks like I only have {"level":"warn","ts":"2022-08-15T19:46:25.050Z","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000871a40/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"} now.

brandond commented 2 years ago

Oh, I see you're starting k3s with both --cluster-init and --datastore-endpoint. The cluster-init flag is used to enable the embedded etcd, and should not be used alongside the datastore-endpoint option. I'm honestly surprised this worked at all.

bbigras commented 2 years ago

If I remove --cluster-init, I still get stream not found but the k3s deploy are ready, even the metrics-server!

❯ kubectl get deploy -A
NAMESPACE     NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   coredns                  1/1     1            1           3m44s
kube-system   local-path-provisioner   1/1     1            1           3m44s
kube-system   metrics-server           1/1     1            1           83s
kube-system   traefik                  1/1     1            1           3m5s
brandond commented 2 years ago

Well you're also starting k3s with --disable-agent, which would prevent you from running any pods unless you are also adding an agent node. Did you remove that flag at the same time you removed --cluster-init, or did you add an agent node?

bbigras commented 2 years ago

Yeah, it seems I removed it. I didn't intend to, but that must explain why the pods are running now.

bbigras commented 2 years ago

So just to be clear, it seems to work except there's still a nats: stream not found warning.

brandond commented 2 years ago

I can't reproduce that, either using nats directly as the k3s datastore endpoint, or using a standalone kine instance. Do you perhaps have some stuff left over from a previous configuration?

bbigras commented 2 years ago

I don't think so.

Note that I also tested running k3s+kine+nats on fly.io so I would guess there's no left over there, but I still get the message.

Maybe I should try to reproduce it with a docker-compose file with all 3 service, with the image tags pinned.

nilathedragon commented 1 year ago

I am able to see the same error message using the latest k0s (v1.24.4+k0s) on a clean install.

Steps to reproduce:

  1. Start NATS docker container with docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV
  2. Install k0s utility $ curl -sSLf https://get.k0s.sh | sudo sh as stated on https://docs.k0sproject.io/v0.12.1/install/
  3. Create K0s configuration mkdir -p /etc/k0s and k0s config create > /etc/k0s/k0s.yaml
  4. Set storage.type to kine and set storage.kine.datasource to nats://<serverip>:4222
  5. Install single node cluster with k0s install controller --single -c /etc/k0s/k0s.yaml
  6. Start k0s with k0s start

The cluster comes up successfully for me, but the logs keep repeating: level=info msg="{\"level\":\"warn\",\"ts\":\"2022-09-21T21:33:44.148Z\",\"logger\":\"etcd-client\",\"caller\":\"v3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"etcd-endpoints://0xc00002fc00//run/k0s/kine/kine.sock:2379\",\"attempt\":0,\"error\":\"rpc error: code = Unknown desc = nats: stream not found\"}" component=kube-apiserver every 30-ish seconds. Sometimes a bit more sometimes a bit less.

I am testing on a Hetzner Cloud Machine, Rocky Linux 8.6 using Docker-CE to run NATS.

brandond commented 1 year ago

I'm not sure how to help with k0s. Can you reproduce on k3s or a standalone instance of kine ?

nilathedragon commented 1 year ago

K0s embeds kine, so I assumed that wouldn't matter 😅 I tested it with k3s, kine and NATS again though:

  1. Start NATS docker container with docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV
  2. Start Kine docker container with docker run -d -p2379:2379 --name=kine rancher/kine:v0.9.4-amd64 --endpoint "nats://<serverip>:4222" --debug
  3. Start k3s docker container with docker run -it --rm --tmpfs /run --tmpfs /var/run --net=host --privileged rancher/k3s server --token=SECRET --datastore-endpoint=http://<serverip>:2379

Wait for the cluster to settle and the same error as with k0s will start appearing in the logs. {"level":"warn","ts":"2022-09-22T10:33:23.705Z","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000a64700/<ip>:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}

Test was conducted on the same host as k0s (After a full reset)

Complete Bash History ``` [root@nats-kube-test ~]# history 1 clear 2 dnf update 3 dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo 4 dnf install -y docker-ce 5 systemctl enable --now docker 6 docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV 7 docker run -d -p2379:2379 --name=kine rancher/kine:v0.9.4-amd64 --endpoint "nats://:4222" --debug 8 docker run -it --rm --tmpfs /run --tmpfs /var/run --net=host --privileged rancher/k3s server --token=SECRET --datastore-endpoint=http://:2379 9 history ```
brandond commented 1 year ago

@caleblloyd @boxboatmatt @jnmoyne can you assist on this? The error appears to be coming from the nats server. I suspect it is being configured incorrectly somehow?

caleblloyd commented 1 year ago

I wonder if this is from the DbSize call - I noticed that call was incorrect and bundled the fix into the version upgrade:

https://github.com/k3s-io/kine/pull/137/files#diff-fefbd7dc68770cb2275bb6c283ace6fe949dc7bf28aeb56f9481a98598bd491aL896

I believe that DbSize currently does return that error

brandond commented 1 year ago

Interesting. In that case, it should be fine to ignore the error for the time being?