Open matchge-ca opened 7 months ago
fmt.Sprintf("--pd=%s-pd.%s:%d", backup.Spec.BR.Cluster, clusterNamespace, v1alpha1.DefaultPDClientPort)
is a K8s service with all PD members as the backend.
it should resolve to other PD members in different DNS lookup calls.
@csuzhangxc what is actually seen from the log is that we received a DNS lookup error from CDC:
pd address (cluster-pd.namespace:2379) not available, error is
Get "https://cluster-pd.namespace:2379/pd/api/v1/config/cluster-version": dial tcp: lookup cluster-pd.namespace on 100.64.0.10:53: no such host,
please check network: [BR:PD:ErrPDUpdateFailed]failed to update PD
is there any chance that switching PD leader will cause the DNS to report NXDOMAIN or return with zero A/AAAA records in the ANSWER section?
@kennytm
is there any chance that switching PD leader will cause the DNS to report NXDOMAIN or return with zero A/AAAA records in the ANSWER section?
NO, can not resolve DNS should often be caused by the PD pod being down (or KubeDNS having problems)
Bug Report
What version of Kubernetes are you using?
What version of TiDB Operator are you using?
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?
What's the status of the TiDB cluster pods?
What did you do?
error=\"pd address not available, ..., dial tcp: lookup <pd addr>: no such host, please check network
What did you expect to see? BR is able to run when PD leader is offline during discovery
What did you see instead? BR failed and raised an error