Closed bysnupy closed 4 years ago
Merging #27 into master will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #27 +/- ##
=======================================
Coverage 75.98% 75.98%
=======================================
Files 20 20
Lines 683 683
=======================================
Hits 519 519
Misses 157 157
Partials 7 7
/assign @2uasimojo
Thanks for this @bysnupy! I'll take a closer look next week with @wgordon17.
@bysnupy We're testing this today and tomorrow.
The daemonset.yaml file isn't actually consumed directly by the operator -- it needs to get generated into bindata. Would you please run make gogenerate
and commit the updates?
Meanwhile I've done that locally and built the image for @wgordon17 to play with.
@bysnupy Unfortunately, I'm unable to replicate your success. Were you testing with EFS DNS resolution across VPCs?? Because it's expected that the EFS URL will resolve in the same VPC.
Testing this, it looks like CoreDNS attempts to query upstream server, it doesn't seem to treat it as a lookup replacement.
spec:
servers:
- forwardPlugin:
upstreams:
- 10.10.29.63
name: efs-lookup
zones:
- efs.us-east-1.amazonaws.com
From CoreDNS logs:
[INFO] 10.128.2.24:49594 - 33253 "A IN fs-34fe53b6.efs.us-east-1.amazonaws.com. udp 68 false 4096" NOERROR - 0 6.000966423s
[ERROR] plugin/errors: 2 fs-34fe53b6.efs.us-east-1.amazonaws.com. A: read udp 10.130.0.80:53087->10.10.29.63:53: i/o timeout
[INFO] 10.128.2.24:49594 - 33253 "A IN fs-34fe53b6.efs.us-east-1.amazonaws.com. udp 68 false 4096" NOERROR - 0 4.000555139s
[ERROR] plugin/errors: 2 fs-34fe53b6.efs.us-east-1.amazonaws.com. A: read udp 10.130.0.80:54223->10.10.29.63:53: i/o timeout
I'm able to hack around this by deploying an in-cluster, custom CoreDNS server that handles the static IP lookup: https://github.com/openshift-cs/OpenShift-Troubleshooting-Templates/tree/master/efs-dns-config
So this is still a viable, albeit slightly hackish solution. But, at least once deployed, will allow non-cluster-admin and non-CCS customers to access EFS volumes over a VPC peering connection
@2uasimojo Thank you for your pointing, I've committed it with the generated files again. Some generated files were not changed except putting a period, but I also added all the generated files as those are, because it's real result.
@wgordon17 Thank you for your confirmation and test around this.
Were you testing with EFS DNS resolution across VPCs?? Because it's expected that the EFS URL will resolve in the same VPC.
I've tested through VPC peering with different VPC and subnet IP range between OCP4 cluster and EFS using same AWS account. And I've added custom upstream DNS server which have the custom record of the EFS DNS name for mounting point, after checking not resolved the EFS DNS name and failed to mount EFS efs-csi-node due to DNS name resolution at the pod logs.
Testing this, it looks like CoreDNS attempts to query upstream server, it doesn't seem to treat it as a lookup replacement.
In my test using "dig" command, it could be immediately resolved custom EFS DNS name through configured upstream DNS server, and then the efs-csi-node worked well at that time. upstream DNS server configuration may take a little bit time for takging effects.
If you have any question around my test let me know that. :)
Thanks for regenerating @bysnupy. I agree the changes in punctuation in the comments are annoying and irrelevant, but appropriate to include.
I'm going to /lgtm /approve this because, at worst, it's not hurting anything. But I'd like to /hold it until we get a bit more feedback from @wgordon17 in case there's something else we can add to make it work for his use case.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: 2uasimojo, bysnupy
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Thanks @2uasimojo 👍🏻 According to Kubernetes documentation
"ClusterFirstWithHostNet": For Pods running with hostNetwork, you should explicitly set its DNS policy "ClusterFirstWithHostNet".
From my perspective, the ClusterFirstWithHostNet
should actually be the correct value for dnsPolicy
.
Agree. Thanks for your work on this @bysnupy and @wgordon17.
/hold cancel
Description: As changing dnsPolicy to
ClusterFirstWithHostNet
, EFS Operator can refer the custom EFS DNS records without much modifications through DNS Operator. To implementshostAliases
on the DaemonSet, it would modify much changes at the reconciling loop logic, but it's simple and we can get the same result through a little bit changes.More Reference: https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/240
More informations:
Before changing dnsPolicy(ClusterFirst), DaemonSet refer host
/etc/resolv.conf
. It will be causes not to use custom DNS records using DNS Operator. And EFS DNS name cannot resolve under VPC peering.After changing dnsPolicy(ClusterFirstWithHostNet), the EFS DNS name can resolve and pod can mount SharedVolume under VPC peering using custom records through DNS Operator
// Operator's SCC is chosen correctly even though dnsPolicy was changed. $ oc get pod efs-csi-node-pm2m7 -o yaml | grep scc openshift.io/scc: efs-csi-scc
// the resolv.conf are changed pod namespace ones. $ oc rsh efs-csi-node-pm2m7 cat /etc/resolv.conf Defaulting container name to efs-plugin. Use 'oc describe pod/efs-csi-node-pm2m7 -n openshift-operators' to see all of the containers in this pod. search openshift-operators.svc.cluster.local svc.cluster.local cluster.local ap-northeast-1.compute.internal nameserver 172.30.0.10 options ndots:5
// Yeah, we can get the custom record through DNS Operator $ oc rsh efs-csi-node-fn756 curl -v fs-ec8395cd.efs.ap-northeast-1.amazonaws.com Defaulting container name to efs-plugin. Use 'oc describe pod/efs-csi-node-fn756 -n openshift-operators' to see all of the containers in this pod.