Open johnqa opened 2 years ago
@johnqa if you are using custom domains/hosts with self signed certs you need to configure litmus with either the tls cert or use the SSL skip feature to skip ssl/tls verification. Or you can remove tls if you don't have a certificate configured.
So I added SKIP_SSL_VERIFY to Subscriber deployment but now I have another error:
required key ACCESS_KEY missing value
Is there a secret resource named agent-secret
present in the agent ns? That should have the access key
yes, the secret is there, but what can I do with it?
kubectl get secret agent-secret -n <ns> -oyaml
and share the output
I have added to deployment config ACCESS_KEY and CLUSTER_ID, but now I have another error:
level=fatal msg="failed to parse cluster confirm data" data="<html>\r\n<head><title>405 Not Allowed</title></head>\r\n<body>\r\n<center><h1>405 Not Allowed</h1></center>\r\n<hr><center>nginx/1.21.6</center>\r\n</body>\r\n</html>\r\n" error="invalid character '<' looking for beginning of value"
Can you try to do a fresh install with the skip SSL env var set from the very beginning in the manifest? I think there might be some issues in the manual changes
I am deploying using litmus helm chart, and I don't see where in values.yaml I can put these values for subscriber.
Use this block to add any arbitrary envs for the server https://github.com/litmuschaos/litmus-helm/blob/cdfc397e0e3795ad62266eaf12b6027f2a38759e/charts/litmus/values.yaml#L192
Just add SKIP_SSL_VERIFY: "true" in the generic block
I did it and the current error is:
level=fatal msg="failed to confirm cluster" data= error="Post \"http://litmus.dnsname.int/backend/query\": dial tcp 10.238.40.210:80: i/o timeout"
Can you see if you can curl/wget that url from inside the cluster network? Maybe just start a bash pod in the cluster and try accessing that URL, if it doesn't work then there's some networking or domain setup issue
Using curl I was not able to connect to http://litmus.dnsname.int/backend/query but i was able to connect to https://litmus.dnsname.int/backend/query
I have changed the ingress settings to have https instead of http and redeployed, but now the subscriber has again the error:
level=fatal msg="failed to parse cluster confirm data" data="<html>\r\n<head><title>405 Not Allowed</title></head>\r\n<body>\r\n<center><h1>405 Not Allowed</h1></center>\r\n<hr><center>nginx/1.21.6</center>\r\n</body>\r\n</html>\r\n" error="invalid character '<' looking for beginning of value"
@johnqa to unblock yourself for now you can just update the URL to http://litmusportal-server-service:9002/query for the self-agent and continue. Also can you check the logs of the graphql server when the subscriber throws that error
Using http://litmus-server-service:9002/query finally worked.
Now I am worried when I will have to add an external agent :)
Thank you, John
Awesome so it's confirmed that the problem is with the domain name/tls cert settings. Imo if it is possible for you to just disable tls in ingress and try with http I think things should work fine.
What happened:
I have installed Litmus with Helm chart, and logged in the Portal.
The self-agent is i Pending, and the pod for litmusportal-subscriber fails with error "failed to confirm cluster", "Post \"https://litmusdns/backend/query\": x509: certifcate signed by unknown authority
My ingress spec looks like this:
What can be the problem for this error?
Thank you, John