Closed nashford77 closed 2 years ago
you need to add .git to .helmgitignore :) ...... dont have .git in your helm :)
You also have a node affinity issue ....
Warning FailedScheduling 5m2s default-scheduler 0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector. Warning FailedScheduling 2m50s (x1 over 3m50s) default-scheduler 0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector.
{{- if .Values.mq.singlenode }}
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: mqhost
operator: In
values:
- "true"
kubectl label nodes
That is where it is coming from - why do you require a label of mqhost ? IF so, why not apply to each node in your install ?
kubectl label nodes
netmaker-1657861409-mqtt-766488c886-jrdk7 0/1 CrashLoopBackOff 6 (3m42s ago) 9m36s -
Why are two MQTT's spinning up ? should there not be just one Queue Server? The second never starts - theres warnings about a bad volume attach but then it attaches fine after but the docker underneath wont come up...
Normal SuccessfulAttachVolume 9m50s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-f5ac6708-3a78-4a94-bcef-93a8e19b6b2e"
Normal Pulled 9m48s kubelet Successfully pulled image "eclipse-mosquitto:2.0.11-openssl" in 323.486898ms
Normal Pulled 9m47s kubelet Successfully pulled image "eclipse-mosquitto:2.0.11-openssl" in 531.392095ms
Normal Started 9m29s (x3 over 9m48s) kubelet Started container mosquitto
Normal Pulled 9m29s kubelet Successfully pulled image "eclipse-mosquitto:2.0.11-openssl" in 190.972043ms
Normal Pulling 9m8s (x4 over 9m48s) kubelet Pulling image "eclipse-mosquitto:2.0.11-openssl"
Normal Created 9m8s (x4 over 9m48s) kubelet Created container mosquitto
Normal Pulled 9m8s kubelet Successfully pulled image "eclipse-mosquitto:2.0.11-openssl" in 199.075891ms
Warning BackOff <invalid> (x66 over 9m46s) kubelet Back-off restarting failed container
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 15 Jul 2022 01:16:45 -0400
Finished: Fri, 15 Jul 2022 01:16:45 -0400
Ready: False
Restart Count: 7
Liveness: tcp-socket :8883 delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: tcp-socket :8883 delay=0s timeout=1s period=10s #success=1 #failure=3
Startup: tcp-socket :8883 delay=0s timeout=1s period=5s #success=1 #failure=30
Environment: <none>
The mqhost label is meant for if you do not have access to an external LB to configure. In this case, you can select a single node which MQ will run on, and then set DNS to point to that worker for the broker domain.
I'm not sure what issues you are running into with multiple MQ replicas but in my env it is coming up fine.
I ran with DNS disabled and it tried to spin two up. I didn't ask it to ? Ref to the Label, it must be added, without the label, it doesn't work at all
On Fri, Jul 15, 2022, 7:43 AM Alex Feiszli @.***> wrote:
I'm not sure what issues you are running into with multiple MQ replicas but in my env it is coming up fine.
— Reply to this email directly, view it on GitHub https://github.com/gravitl/netmaker/issues/1391#issuecomment-1185463721, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSSJU7STJYQW6CXKEC4L2DVUFFFFANCNFSM53T35BZQ . You are receiving this because you authored the thread.Message ID: @.***>
netmaker-1657861409-mqtt-766488c886-5n5kk 1/1 Running 2 (12h ago) 12h netmaker-1657861409-mqtt-766488c886-jrdk7 0/1 CrashLoopBackOff 154 (3m32s ago) 12h
(kolla-yoga) root@u500-cube-server:/home/gfrid/k8# kubectl logs netmaker-1657861409-mqtt-766488c886-jrdk7 chown: /mosquitto/config/mosquitto.conf: Read-only file system 1657907307: mosquitto version 2.0.11 starting 1657907307: Config loaded from /mosquitto/config/mosquitto.conf. 1657907307: Opening ipv4 listen socket on port 8883. 1657907307: Opening ipv6 listen socket on port 8883. 1657907307: Error: Unable to load CA certificates. Check cafile "/mosquitto/certs/root.pem". 1657907307: Error: Unable to load server certificate "/mosquitto/certs/server.pem". Check certfile. 1657907307: OpenSSL Error[0]: error:02001002:system library:fopen:No such file or directory 1657907307: OpenSSL Error[1]: error:20074002:BIO routines:file_ctrl:system lib 1657907307: OpenSSL Error[2]: error:140DC002:SSL routines:use_certificate_chain_file:system lib
(kolla-yoga) root@u500-cube-server:/home/gfrid/k8# kubectl logs netmaker-1657861409-mqtt-766488c886-5n5kk | more chown: /mosquitto/config/mosquitto.conf: Read-only file system 1657861585: mosquitto version 2.0.11 starting 1657861585: Config loaded from /mosquitto/config/mosquitto.conf. 1657861585: Opening ipv4 listen socket on port 8883. 1657861585: Opening ipv6 listen socket on port 8883. 1657861585: Opening ipv4 listen socket on port 1883. 1657861585: Opening ipv6 listen socket on port 1883. 1657861585: mosquitto version 2.0.11 running 1657861589: New connection from 172.16.5.151:41290 on port 8883. 1657861589: Client <unknown> disconnected: Protocol error. 1657861594: New connection from 10.100.40.211:39516 on port 1883. 1657861594: New client connected from 10.100.40.211:39516 as Mf7780zq4UWaYscs2J1te75 (p2, c1, k60). 1657861594: New connection from 10.100.63.46:44816 on port 1883. 1657861594: New client connected from 10.100.63.46:44816 as 25UepmnEEhIWAFEJwc200Ky (p2, c1, k60).
Seems like a CERT issue - did you push the root cert to the second POD ? Or perhaps its not mounting it correct with multi attach / standard SC for me ? Everything else is tho.
@nashford77 the certs are loaded from an RWX volume which is mounted to Netmaker and to the MQ pods. It is very strange that the file would be visible to one MQ container but not the other container. I am not sure what could be causing this issue on your install, but in general, you should see the following certs under /mosquitto/certs/
-rw-r--r-- 1 mosquitt mosquitt 119 Jul 8 15:12 root.key -rw-r--r-- 1 mosquitt mosquitt 542 Jul 8 15:12 root.pem -rw-r--r-- 1 mosquitt mosquitt 119 Jul 8 15:12 server.key -rw-r--r-- 1 mosquitt mosquitt 595 Jul 8 15:12 server.pem -rw-r--r-- 1 mosquitt mosquitt 119 Jul 8 15:12 serverclient.key -rw-r--r-- 1 mosquitt mosquitt 554 Jul 8 15:12 serverclient.pem
These are all generated by the Netmaker server and stored in the "shared-certs" PV.
I've changed the default to remove the need for the "mqhost" label. However, please note that this means by default, you need to modify your loadbalancer to balance 31883 --> 31883 (noted in readme). I've also added .git to the .helmignore.
I assume this is only in the case of external queue servers ? Is it possible to set that we can enable / disable the external queue & just use one within k8s ? Still not 100% clear why there's two in the first place or is there HA for the queues as well ? So odd one works. It's not a multi attach issue, to me seems to be bad logic. The worker node thinks it's not ready - it's a multi-attached volume for the certs. More than one thing can mount it. I see it complaining that it's not in the expected state, yet every other volume that's shared works fine as does this one..... Hope that is a useful hint to track down this bug ? It complains that the volume should only say "available" which is wrong for multi-attached - I don't think it's even trying the mount at all, if it did, it would succeed.
On Wed, Jul 20, 2022, 8:45 AM Alex Feiszli @.***> wrote:
I've changed the default to remove the need for the "mqhost" label. However, please note that this means by default, you need to modify your loadbalancer to balance 31883 --> 31883 (noted in readme). I've also added .git to the .helmignore.
— Reply to this email directly, view it on GitHub https://github.com/gravitl/netmaker/issues/1391#issuecomment-1190242396, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSSJU7GQI5IAV3KB6YSCJTVU7YH3ANCNFSM53T35BZQ . You are receiving this because you were mentioned.Message ID: @.***>
I am not sure what the issue is for your k8s setup, but you are welcome to scale down MQ to one instance, which should also be fine.
Error: INSTALLATION FAILED: create: failed to create: Secret "sh.helm.release.v1.netmaker-helm-1657841004.v1" is invalid: data: Too long: must have at most 1048576 bytes
Maybe to do with .helmgitignore , special characters or something else... ?