gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.4k stars 547 forks source link

Helm Chart Not Working (0.2.0) #1391

Closed nashford77 closed 2 years ago

nashford77 commented 2 years ago

Error: INSTALLATION FAILED: create: failed to create: Secret "sh.helm.release.v1.netmaker-helm-1657841004.v1" is invalid: data: Too long: must have at most 1048576 bytes

Maybe to do with .helmgitignore , special characters or something else... ?

nashford77 commented 2 years ago

you need to add .git to .helmgitignore :) ...... dont have .git in your helm :)

nashford77 commented 2 years ago

You also have a node affinity issue ....

Warning FailedScheduling 5m2s default-scheduler 0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector. Warning FailedScheduling 2m50s (x1 over 3m50s) default-scheduler 0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector.

nashford77 commented 2 years ago
  {{- if .Values.mq.singlenode }}    
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: mqhost
            operator: In
            values:
            - "true" 

kubectl label nodes nodePool=cluster

nashford77 commented 2 years ago

That is where it is coming from - why do you require a label of mqhost ? IF so, why not apply to each node in your install ?

nashford77 commented 2 years ago

kubectl label nodes mqhost=true (on all worker nodes)

nashford77 commented 2 years ago

netmaker-1657861409-mqtt-766488c886-jrdk7 0/1 CrashLoopBackOff 6 (3m42s ago) 9m36s -

Why are two MQTT's spinning up ? should there not be just one Queue Server? The second never starts - theres warnings about a bad volume attach but then it attaches fine after but the docker underneath wont come up...

 Normal   SuccessfulAttachVolume  9m50s                       attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-f5ac6708-3a78-4a94-bcef-93a8e19b6b2e"
  Normal   Pulled                  9m48s                       kubelet                  Successfully pulled image "eclipse-mosquitto:2.0.11-openssl" in 323.486898ms
  Normal   Pulled                  9m47s                       kubelet                  Successfully pulled image "eclipse-mosquitto:2.0.11-openssl" in 531.392095ms
  Normal   Started                 9m29s (x3 over 9m48s)       kubelet                  Started container mosquitto
  Normal   Pulled                  9m29s                       kubelet                  Successfully pulled image "eclipse-mosquitto:2.0.11-openssl" in 190.972043ms
  Normal   Pulling                 9m8s (x4 over 9m48s)        kubelet                  Pulling image "eclipse-mosquitto:2.0.11-openssl"
  Normal   Created                 9m8s (x4 over 9m48s)        kubelet                  Created container mosquitto
  Normal   Pulled                  9m8s                        kubelet                  Successfully pulled image "eclipse-mosquitto:2.0.11-openssl" in 199.075891ms
  Warning  BackOff                 <invalid> (x66 over 9m46s)  kubelet                  Back-off restarting failed container

    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 15 Jul 2022 01:16:45 -0400
      Finished:     Fri, 15 Jul 2022 01:16:45 -0400
    Ready:          False
    Restart Count:  7
    Liveness:       tcp-socket :8883 delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      tcp-socket :8883 delay=0s timeout=1s period=10s #success=1 #failure=3
    Startup:        tcp-socket :8883 delay=0s timeout=1s period=5s #success=1 #failure=30
    Environment:    <none>
afeiszli commented 2 years ago

The mqhost label is meant for if you do not have access to an external LB to configure. In this case, you can select a single node which MQ will run on, and then set DNS to point to that worker for the broker domain.

afeiszli commented 2 years ago

I'm not sure what issues you are running into with multiple MQ replicas but in my env it is coming up fine.

nashford77 commented 2 years ago

I ran with DNS disabled and it tried to spin two up. I didn't ask it to ? Ref to the Label, it must be added, without the label, it doesn't work at all

On Fri, Jul 15, 2022, 7:43 AM Alex Feiszli @.***> wrote:

I'm not sure what issues you are running into with multiple MQ replicas but in my env it is coming up fine.

— Reply to this email directly, view it on GitHub https://github.com/gravitl/netmaker/issues/1391#issuecomment-1185463721, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSSJU7STJYQW6CXKEC4L2DVUFFFFANCNFSM53T35BZQ . You are receiving this because you authored the thread.Message ID: @.***>

nashford77 commented 2 years ago

netmaker-1657861409-mqtt-766488c886-5n5kk 1/1 Running 2 (12h ago) 12h netmaker-1657861409-mqtt-766488c886-jrdk7 0/1 CrashLoopBackOff 154 (3m32s ago) 12h

(kolla-yoga) root@u500-cube-server:/home/gfrid/k8# kubectl logs netmaker-1657861409-mqtt-766488c886-jrdk7 chown: /mosquitto/config/mosquitto.conf: Read-only file system 1657907307: mosquitto version 2.0.11 starting 1657907307: Config loaded from /mosquitto/config/mosquitto.conf. 1657907307: Opening ipv4 listen socket on port 8883. 1657907307: Opening ipv6 listen socket on port 8883. 1657907307: Error: Unable to load CA certificates. Check cafile "/mosquitto/certs/root.pem". 1657907307: Error: Unable to load server certificate "/mosquitto/certs/server.pem". Check certfile. 1657907307: OpenSSL Error[0]: error:02001002:system library:fopen:No such file or directory 1657907307: OpenSSL Error[1]: error:20074002:BIO routines:file_ctrl:system lib 1657907307: OpenSSL Error[2]: error:140DC002:SSL routines:use_certificate_chain_file:system lib

(kolla-yoga) root@u500-cube-server:/home/gfrid/k8# kubectl logs netmaker-1657861409-mqtt-766488c886-5n5kk | more chown: /mosquitto/config/mosquitto.conf: Read-only file system 1657861585: mosquitto version 2.0.11 starting 1657861585: Config loaded from /mosquitto/config/mosquitto.conf. 1657861585: Opening ipv4 listen socket on port 8883. 1657861585: Opening ipv6 listen socket on port 8883. 1657861585: Opening ipv4 listen socket on port 1883. 1657861585: Opening ipv6 listen socket on port 1883. 1657861585: mosquitto version 2.0.11 running 1657861589: New connection from 172.16.5.151:41290 on port 8883. 1657861589: Client <unknown> disconnected: Protocol error. 1657861594: New connection from 10.100.40.211:39516 on port 1883. 1657861594: New client connected from 10.100.40.211:39516 as Mf7780zq4UWaYscs2J1te75 (p2, c1, k60). 1657861594: New connection from 10.100.63.46:44816 on port 1883. 1657861594: New client connected from 10.100.63.46:44816 as 25UepmnEEhIWAFEJwc200Ky (p2, c1, k60).

Seems like a CERT issue - did you push the root cert to the second POD ? Or perhaps its not mounting it correct with multi attach / standard SC for me ? Everything else is tho.

afeiszli commented 2 years ago

@nashford77 the certs are loaded from an RWX volume which is mounted to Netmaker and to the MQ pods. It is very strange that the file would be visible to one MQ container but not the other container. I am not sure what could be causing this issue on your install, but in general, you should see the following certs under /mosquitto/certs/

-rw-r--r-- 1 mosquitt mosquitt 119 Jul 8 15:12 root.key -rw-r--r-- 1 mosquitt mosquitt 542 Jul 8 15:12 root.pem -rw-r--r-- 1 mosquitt mosquitt 119 Jul 8 15:12 server.key -rw-r--r-- 1 mosquitt mosquitt 595 Jul 8 15:12 server.pem -rw-r--r-- 1 mosquitt mosquitt 119 Jul 8 15:12 serverclient.key -rw-r--r-- 1 mosquitt mosquitt 554 Jul 8 15:12 serverclient.pem

These are all generated by the Netmaker server and stored in the "shared-certs" PV.

afeiszli commented 2 years ago

I've changed the default to remove the need for the "mqhost" label. However, please note that this means by default, you need to modify your loadbalancer to balance 31883 --> 31883 (noted in readme). I've also added .git to the .helmignore.

nashford77 commented 2 years ago

I assume this is only in the case of external queue servers ? Is it possible to set that we can enable / disable the external queue & just use one within k8s ? Still not 100% clear why there's two in the first place or is there HA for the queues as well ? So odd one works. It's not a multi attach issue, to me seems to be bad logic. The worker node thinks it's not ready - it's a multi-attached volume for the certs. More than one thing can mount it. I see it complaining that it's not in the expected state, yet every other volume that's shared works fine as does this one..... Hope that is a useful hint to track down this bug ? It complains that the volume should only say "available" which is wrong for multi-attached - I don't think it's even trying the mount at all, if it did, it would succeed.

On Wed, Jul 20, 2022, 8:45 AM Alex Feiszli @.***> wrote:

I've changed the default to remove the need for the "mqhost" label. However, please note that this means by default, you need to modify your loadbalancer to balance 31883 --> 31883 (noted in readme). I've also added .git to the .helmignore.

— Reply to this email directly, view it on GitHub https://github.com/gravitl/netmaker/issues/1391#issuecomment-1190242396, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSSJU7GQI5IAV3KB6YSCJTVU7YH3ANCNFSM53T35BZQ . You are receiving this because you were mentioned.Message ID: @.***>

afeiszli commented 2 years ago

I am not sure what the issue is for your k8s setup, but you are welcome to scale down MQ to one instance, which should also be fine.