Closed ramesh8830 closed 4 years ago
Hi @akshaymankar
I saw a post, you commented on a issue which is related the issue I am facing right now. https://github.com/kubernetes-client/haskell/issues/64
Please help me if you know how to fix this issue.
In https://github.com/wireapp/wire-server-deploy/issues/282, you mentioned that you are using minio. Do you use the fake-aws
stack all the way down? If that is the case, I believe sth is wrong somewhere in your configuration, because the logs show a request to AWS message queue sqs.us-east-1.amazonaws.com
.
I am not using fake-aws. Certificate errors are coming for all the external api urls. Below one is for giphy api.
1:W,7:request,1:=,32:e60ed8ac72b0ab4d31c2b24cd0c58658,13:gateway error,5:error,1:=,644:HttpExceptionRequest Request { host = "api.giphy.com" port = 443 secure = True requestHeaders = [("X-Real-IP","0.0.0.0")] path = "/v1/gifs/random" queryString = "?api_key=7CPNn7Xa5QFsBhPy&tag=Hello" method = "GET" proxy = Nothing rawBody = False redirectCount = 0 responseTimeout = ResponseTimeoutNone requestVersion = HTTP/1.1 } (InternalException (HandshakeFailed (Error_Protocol ("certificate rejected: [NameMismatch \"api.giphy.com\"]",True,CertificateUnknown)))),
This sound as if it is a rather new development. It that the case? If so, what changed? If not, is the platform usable for you? If not, which use cases don't work for you?
I didn't understand what you are asking. Its a new development. I am trying to install the production setup using the instruction given in the documentation on my bare metal servers. I don't know what changes you are talking about.
https://github.com/wireapp/wire-server-deploy/issues/269 Its also a same case and that developer mentioned that he didn't get that error when he uses the public cloud platform. But I don't want to use public cloud.
Okay, let me clarify:
Hi @akshaymankar
I saw a post, you commented on a issue which is related the issue I am facing right now. kubernetes-client/haskell#64
Please help me if you know how to fix this issue.
That issue occurs when someone uses the pure haskell TLS library as a client and talks to a TLS server addressed by IP address. In this case:
From the errors in the logs, it seems like there might be a proxy between wire components and these external services and it is providing an invalid certificate. Can you exec into one of the pods, install openssl and run this command and let us know what certificate you see and if that certificate is indeed a valid one for the domains you're trying to access:
openssl s_client -connect sqs.us-east-1.amazonaws.com:443 <<< "Q"
@ramesh8830 I deleted your comment as it seemed to have some aws creds. I copied the text here.
Okay, let me clarify:
- Please explain your overall setup as detailed as possible (example)
Bare Metal Servers
Installation type:Production
Kubernetes version:Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"6908604ea1efe1179a19b0454d6e3b27df4fa62b", GitTreeState:"clean", BuildDate:"2020-02-16T16:23:09Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"6908604ea1efe1179a19b0454d6e3b27df4fa62b", GitTreeState:"clean", BuildDate:"2020-02-16T16:14:56Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Installed with Kubespray:
Yes
Helm chart version:Client: &version.Version{SemVer:"v2.13.1", GitCommit:"644187cbf2c37a5d39fbb4147601b4b903dbd7f8", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.13.1", GitCommit:"644187cbf2c37a5d39fbb4147601b4b903dbd7f8", GitTreeState:"clean"}
- Please describe the current behaviour of the platform from a user perspective (what works, what does not)
Chat, Audio & Video Calls working. File/Images sharing not showing/downloadable on the chat window.
- Was there a time in the past when the errors you pasted did not appear? If so, what as changed in your setup?
No. Those errors are coming from the beginning.
- In case you are open to share these information, please provide all your Helm values and secrets files - redact as you find suitable.
Attached [REDACTED]
Hi @akshaymankar I saw a post, you commented on a issue which is related the issue I am facing right now. kubernetes-client/haskell#64 Please help me if you know how to fix this issue.
That issue occurs when someone uses the pure haskell TLS library as a client and talks to a TLS server addressed by IP address. In this case:
- Wire doesn't use the pure haskell TLS library, we use openssl haskell bindings
- Your server is not being addressed by its IP address.
From the errors in the logs, it seems like there might be a proxy between wire components and these external services and it is providing an invalid certificate. Can you exec into one of the pods, install openssl and run this command and let us know what certificate you see and if that certificate is indeed a valid one for the domains you're trying to access:
openssl s_client -connect sqs.us-east-1.amazonaws.com:443 <<< "Q"
bash-5.0# openssl s_client -connect sqs.us-east-1.amazonaws.com:443 <<< "Q"
CONNECTED(00000003)
depth=0 O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
0 s:O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate
i:O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIDbzCCAlegAwIBAgIQK2vBXe/UfeOQUlWqOyCfDTANBgkqhkiG9w0BAQsFADBL
MRAwDgYDVQQKEwdBY21lIENvMTcwNQYDVQQDEy5LdWJlcm5ldGVzIEluZ3Jlc3Mg
Q29udHJvbGxlciBGYWtlIENlcnRpZmljYXRlMB4XDTIwMDYxNzIyMzA0OVoXDTIx
MDYxNzIyMzA0OVowSzEQMA4GA1UEChMHQWNtZSBDbzE3MDUGA1UEAxMuS3ViZXJu
ZXRlcyBJbmdyZXNzIENvbnRyb2xsZXIgRmFrZSBDZXJ0aWZpY2F0ZTCCASIwDQYJ
KoZIhvcNAQEBBQADggEPADCCAQoCggEBAL1iAFG6hyJHRgcIqCaxaHyRLSpxUGYX
aynrDX8EtDyJPajOwYj5zjNvmF/Iou/hUaebO6irWPBiZ0BHGTZXX8ybPU36alef
fUHsNbYNwsWYeVs/YmGRt3i43L5LVUMEEzfPQMG8QVfIYl2/N1ot+5ZGNdzSPUz4
lc/hQ25IHR8+ml7A8N+T9ws+Gi5HmyC/bRS47BrmBLvXwgDHuBGSwhqM3r9Qcafg
HQck1nDCWr4OpVAIbVvRuNHedJaBIP12k1yIHUaU6kmqiTt+QT80+MHojp2mCTPm
ZZh7MF3UBfPm+171d/5Sy25LDXZUvfkyb/+AdTZ+4S0Poj578hu3aOUCAwEAAaNP
ME0wDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB
/wQCMAAwGAYDVR0RBBEwD4INaW5ncmVzcy5sb2NhbDANBgkqhkiG9w0BAQsFAAOC
AQEAOMYJ6dvjOWaAoW3oAaD+hfm1IvkTJLvaJG0iTqooyOA7Fd0kMVAAOsFTionK
M5PqGTqnVb3vYYDtAPzK4JHxwPCPntcbblD+5kW5eC566PGYjy39zIuaM9R4hpAU
JywMb/JgMiMaDIvxzIYAMpuw2ojTXHkh7sS/tnNuLb9xkzlLX0EuIuj4asG006S0
7ij0QeNMAy5RzFyhhxY6CvMa3e7FIDoiHCOA1/e+VAJJJyQmv3Xny4F/BJF1TnS7
V8PDvycm7lRKq0RhyIX5tNsCtMEPCYKZ/ddCT80a8khinxsMmMrzUhf3BCwR5pgE
7p2xF11CAAWjq0qKvkQYq8VrGg==
-----END CERTIFICATE-----
subject=O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate
issuer=O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate
---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 1524 bytes and written 422 bytes
Verification error: unable to verify the first certificate
---
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : ECDHE-RSA-AES256-GCM-SHA384
Session-ID: 06E0B37A56FEBD128691908D4821E47234F882A3BFCA0E6BB3CE41BEBAA1A5CF
Session-ID-ctx:
Master-Key: 8C46DC4E01602E39482F3701B3F1F5E62DA50BF8638B0A7F2887A45D30EF5893B44BCB50B2F7DFA9AF2DACFE317B39C3
PSK identity: None
PSK identity hint: None
SRP username: None
TLS session ticket lifetime hint: 600 (seconds)
TLS session ticket:
0000 - f9 c2 f1 38 77 72 af be-1e 26 b9 5d 72 14 9e 3f ...8wr...&.]r..?
0010 - b6 86 15 77 12 00 1f f0-46 76 01 34 0b 0c dc 12 ...w....Fv.4....
0020 - 7c 20 32 d0 3c eb c9 79-b8 54 3a 86 24 7c 96 2b | 2.<..y.T:.$|.+
0030 - 74 fd 9b 7f f0 10 8a b7-30 d5 3b cf bd 17 19 d0 t.......0.;.....
0040 - cf 96 50 5e 2b da ab 21-1b 54 d0 9e 98 7f e4 a2 ..P^+..!.T......
0050 - 0f 9d 11 d5 e5 5d b5 9c-b5 33 7e 15 08 d4 ef 4a .....]...3~....J
0060 - 6c 7a a7 c7 b8 5b 42 92-ba 42 15 b1 56 81 df b7 lz...[B..B..V...
0070 - b8 63 71 2b d9 69 7d f3-e2 b5 92 20 58 0b bb e9 .cq+.i}.... X...
0080 - b5 65 72 e7 26 eb d2 b1-ac 1b 4b 86 42 03 b3 7e .er.&.....K.B..~
0090 - 4f 0f e7 90 7f ed b0 cc-bf 95 16 11 ff f0 15 f3 O...............
00a0 - af 43 54 4a 6d 12 ce 22-6c 4d a9 65 60 a4 bd 30 .CTJm.."lM.e`..0
Start Time: 1592474652
Timeout : 7200 (sec)
Verify return code: 21 (unable to verify the first certificate)
Extended master secret: yes
---
DONE
We would like to ask for further details:
values.yaml
brig.config.aws.region: "eu-west-1"
but didnt replace it with brig.config.aws.region: "us-east-1"
while you still seem to use AWS resources (e.g. SQS, DynamoDB)tags.proxy: false
and see id this changes anythingminio-external
Helm chart?Additional notes
--- Certificate chain 0 s:O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate i:O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate ---
Looks like sqs.us-east-1.amazonaws.com
is resolving to some ingress controller IP address. It suggests that the DNS configuration in your K8s cluster is wrong.
We would like to ask for further details:
- Which AWS resource do you use?
SQS, SNS, DynamoDB and now S3
- Which top-level charts did you install? Which ones did you remove afterwards?
Installed: cassandra-external databases-ephemeral elasticsearch-external demo-smtp wire-server nginx-ingress-controller nginx-ingress-services
Removed: fake-aws minio-externals
- Browser logs
values.yaml
- you removed
brig.config.aws.region: "eu-west-1"
but didnt replace it withbrig.config.aws.region: "us-east-1"
while you still seem to use AWS resources (e.g. SQS, DynamoDB)
That was added in the recent commits. However I have changed the regions inside the chart yaml files to reflect us-east-1
- you may want to try and set
tags.proxy: false
and see id this changes anything- following up on what @akshaymankar pointed out: If it's actually the case that your installation is in an environment where you need to configure an HTTP(s) proxy, you can do that for various wire-server services (e.g. for Brig)
I am not using proxy
- since you switched from minio to s3, are there any minio leftovers, like the
minio-external
Helm chart?
Removed it.
Additional notes
- you may want to adjust your s3 CORS headers as mentioned by @kvaps here
s3 CORS headers fixed the images and files issues and works fine now. But I wanted to make it work with minio.
--- Certificate chain 0 s:O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate i:O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate ---
Looks like
sqs.us-east-1.amazonaws.com
is resolving to some ingress controller IP address. It suggests that the DNS configuration in your K8s cluster is wrong.
I configured the DNS as per the documentation How to set up DNS records¶ and all my VMs are on Static IP addresses.
I am not sure what else I missed. Please help me.
Do I need to install aws-ingress charts as well?
I configured the DNS as per the documentation How to set up DNS records¶ and all my VMs are on Static IP addresses.
I meant the DNS server which the pods are using. So, please look into the DNS server configured in your pods and you might find whatever is wrong. If you think your problem is due to something documented in wire-server-deploy, please let us know.
am not using proxy
Yes you do. But to clarify: @akshaymankar was talking about a general layer 7 HTTP proxy from within the cluster to the outside. I'm instead, was just referring to the proxy for previews, which you have enabled in your values.yaml
:
# CHANGEME-PROD: All values here should be changed/reviewed
tags:
proxy: true # enable if you want/need giphy/youtube/etc proxying
So, I'd like to ask you to disable (remove, aka un-deploy) it, to see if it changes anything. I doubt it, though.
After you followed @akshaymankar suggestions w/o any luck, at this point, we would suggest to do a complete redeploy to see if you can reproduce the error. Let us know how it went.
I configured the DNS as per the documentation How to set up DNS records¶ and all my VMs are on Static IP addresses.
I meant the DNS server which the pods are using. So, please look into the DNS server configured in your pods and you might find whatever is wrong. If you think your problem is due to something documented in wire-server-deploy, please let us know.
root@kubenode01:# kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.233.65.105:53,10.233.66.202:53,10.233.65.105:53 + 3 more... 157d
root@kubenode01:# kubectl exec -n production brig-74bf8d9885-nk94t cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.233.64.218 brig-74bf8d9885-nk94t
root@kubenode01:# kubectl exec -n production brig-74bf8d9885-nk94t cat /etc/resolv.conf
nameserver 169.254.25.10
search production.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
root@kubenode01:# kubectl exec -ti -n production brig-74bf8d9885-nk94t -- nslookup kubernetes.default
Server: 169.254.25.10
Address: 169.254.25.10:53
** server can't find kubernetes.default: NXDOMAIN
** server can't find kubernetes.default: NXDOMAIN
command terminated with exit code 1
root@kubenode01:# kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.233.65.105:53,10.233.66.202:53,10.233.65.105:53 + 3 more... 157d
I am sorry if I am not following you. I am new to the technology.
Please advise if something wrong with my dns configuration.
am not using proxy
Yes you do. But to clarify: @akshaymankar was talking about a general layer 7 HTTP proxy from within the cluster to the outside. I'm instead, was just referring to the proxy for previews, which you have enabled in your
values.yaml
:# CHANGEME-PROD: All values here should be changed/reviewed tags: proxy: true # enable if you want/need giphy/youtube/etc proxying
So, I'd like to ask you to disable (remove, aka un-deploy) it, to see if it changes anything. I doubt it, though.
After you followed @akshaymankar suggestions w/o any luck, at this point, we would suggest to do a complete redeploy to see if you can reproduce the error. Let us know how it went.
Hi @lucendio
I disabled the proxy from the values.yaml file and redeployed the server and those errors still reflecting.
I am sorry if I am not following you. I am new to the technology.
Please advise if something wrong with my dns configuration.
Something is definitely wrong with the way you deployed k8s, not being able to resolve kubernetes.default
is a sign. I am not sure if I can do anything to help you.
--- Certificate chain 0 s:O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate i:O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate ---
Looks like
sqs.us-east-1.amazonaws.com
is resolving to some ingress controller IP address. It suggests that the DNS configuration in your K8s cluster is wrong.I configured the DNS as per the documentation How to set up DNS records¶ and all my VMs are on Static IP addresses.
I am not sure what else I missed. Please help me.
Hi @akshaymankar
After working on the things for many hours, I noticed that those error are only coming on the kubernetes main master node (kubenode1) pods and resolving to ingress controller IP address. But other kubernetes nodes (kubenode2 and kubenode3) pods are not having any errors and resolving correctly when run openssl s_client -connect sqs.us-east-1.amazonaws.com:443 <<< "Q"
Please advise me what i have to do.
Thanks, Ramesh
Hey @ramesh8830, I am sorry I can't do anything here, you'll have to fix your kubernetes installation yourself. If you have any wire specific issue, please let us know.
Hi @akshaymankar
Real problem is with routing rules provided on the documentation.
iptables -t nat -A PREROUTING -p tcp --dport 443 -j REDIRECT --to-port 31773 iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 31772
When I use above rules, Pods are always resolving to Kubernetes Ingress Controller Fake Certificate and giving those errors otherwise I don't see those errors. But without those rules I can't route traffic to access front and back end. The same issue was reported by @rahulit1991 in previous issues. He mentioned that he don't have any problem with cloud servers, but i don't want use cloud server.
https://github.com/wireapp/wire-server-deploy/issues/269#issuecomment-631478129
is there any alternative solution for this routing issue?
Just to give you, @ramesh8830, a better understanding, where those ports are coming from: charts/nginx-ingress-controller.
Of course, It's up to you to adjust, override values or even discard these two nginx-ingress-...
charts completely, in favour of something more fitting to your setup.
Hi @lucendio
I wiped out all the servers and did a fresh installation followed by instructions from wire and those error still same on the Master Node after executing below rules.
iptables -t nat -A PREROUTING -p tcp --dport 443 -j REDIRECT --to-port 31773 iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 31772
Hi,
I also have certificate errors when I try to use real AWS SNS/SQS.
This is what I get when changing from fake aws in gundeck configuration to real aws.
1:I,6:logger,1:=,17:cassandra.gundeck,125:Known hosts: [datacenter1:rack1:myipaddress:9042,datacenter1:rack1:myipaddress:9042,datacenter1:rack1:myipaddress:9042],
1:I,6:logger,1:=,17:cassandra.gundeck,73:New control connection: datacenter1:rack1:myipaddress:9042#<socket: 11>,
service: GeneralError (TransportError (HttpExceptionRequest Request {
host = "sqs.eu-central-1.amazonaws.com"
port = 443
secure = True
requestHeaders = [("Host","sqs.eu-central-1.amazonaws.com"),("X-Amz-Date","20200912T165236Z"),("X-Amz-Content-SHA256","db03523f1116689f2c131edbe45eb0063f167a419c956bc9f59ba9bb75e67c61"),("Content-Type","application/x-www-form-urlencoded; charset=utf-8"),("Authorization","
Here is my gundeck configuration:
gundeck: replicaCount: 3
config: cassandra: host: cassandra-external aws:
account: "633735454791"
region: "eu-central-1"
arnEnv: integration
queueName: gundeck-events
sqsEndpoint: https://sqs.eu-central-1.amazonaws.com
snsEndpoint: https://sns.eu-central-1.amazonaws.com
Can someone please help me?
Thanks, Shaif
Hi,
I have wild card certificate for setup and its a valid one. But lot of certificate failed error coming in the pod logs for aws services end points.
1:E,6:logger,1:=,11:aws.gundeck,5:error,1:=,891:GeneralError (TransportError (HttpExceptionRequest Request { host = "sqs.us-east-1.amazonaws.com" port = 443 secure = True requestHeaders = [("Host","sqs.us-east-1.amazonaws.com"),("X-Amz-Date","20200617T111732Z"),("X-Amz-Content-SHA256","8d19ceea0c2609868a134d22880a0d1d8ce6ff6b22a4203a12540f23cf9b6f70"),("Content-Type","application/x-www-form-urlencoded; charset=utf-8"),("Authorization","<REDACTED>")] path = "/" queryString = "" method = "POST" proxy = Nothing rawBody = False redirectCount = 0 responseTimeout = ResponseTimeoutMicro 70000000 requestVersion = HTTP/1.1 } (InternalException (HandshakeFailed (Error_Protocol ("certificate rejected: [NameMismatch \"sqs.us-east-1.amazonaws.com\"]",True,CertificateUnknown)))))),23:Failed to read from SQS,
{"error":"GeneralError (TransportError (HttpExceptionRequest Request {\n host = \"sqs.us-east-1.amazonaws.com\"\n port = 443\n secure = True\n requestHeaders = [(\"Host\",\"sqs.us-east-1.amazonaws.com\"),(\"X-Amz-Date\",\"20200617T080625Z\"),(\"X-Amz-Content-SHA256\",\"1bcb22493cf3e61a8a93aaf06da833295b32d77fb7ccb06cfa3bbd8cc04968e3\"),(\"Content-Type\",\"application/x-www-form-urlencoded; charset=utf-8\"),(\"Authorization\",\"<REDACTED>\")]\n path = \"/\"\n queryString = \"\"\n method = \"POST\"\n proxy = Nothing\n rawBody = False\n redirectCount = 0\n responseTimeout = ResponseTimeoutMicro 70000000\n requestVersion = HTTP/1.1\n}\n ResponseTimeout))","logger":"aws.brig","msgs":["E","Failed to read from SQS"]}
1:E,7:request,1:=,32:cbf32839bf38138892019bcbc5013f35,679:HttpExceptionRequest Request { host = "sqs.us-east-1.amazonaws.com" port = 443 secure = True requestHeaders = [("Date","Wed, 17 Jun 2020 11:16:57 GMT"),("Authorization","<REDACTED>")] path = "/assets/v3/eternal/d3d776ae-fac1-44bd-918d-beed0b86c920" queryString = "" method = "HEAD" proxy = Nothing rawBody = False redirectCount = 10 responseTimeout = ResponseTimeoutDefault requestVersion = HTTP/1.1 } (InternalException ProtocolError "error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed"),
pods are getting filled with above logs. Please help me how to fix this issue