Open l0vest0rm opened 5 years ago
Hi
Thanks for the feedback. Can you provide the content of your complete Makefile and the full command you used to create the cluster?
make create-cluster make edit-cluster // and add spec.yaml make update-cluster
TARGET_REGION ?= cn-northwest-1 AWS_PROFILE ?= default KOPS_STATE_STORE ?= s3://kops.xxxx.com VPCID ?= vpc-xxxxxx MASTER_SIZE ?= c5.xlarge MASTER_COUNT ?= 3 NODE_SIZE ?= c5.2xlarge NODE_COUNT ?= 1 SSH_PUBLIC_KEY ?= ~/key/k8s-dev.pub KUBERNETES_VERSION ?= v1.11.9 KOPS_VERSION ?= 1.11.1 NETWORKING ?= amazon-vpc-routed-eni
AWS_DEFAULT_REGION ?= $(TARGET_REGION) AWS_REGION ?= $(AWS_DEFAULT_REGION) ifeq ($(TARGET_REGION) ,cn-north-1) CLUSTER_NAME ?= cluster.bjs.k8s.local AMI ?= ami-0caaf17a3032c1b56 ZONES ?= cn-north-1a,cn-north-1b endif
ifeq ($(TARGET_REGION) ,cn-northwest-1) CLUSTER_NAME ?= cluster.zhy.k8s.local AMI ?= ami-0a863f3b0a0720e6a ZONES ?= cn-northwest-1a,cn-northwest-1b,cn-northwest-1c endif
KUBERNETES_VERSION_URI ?= "https://s3.cn-north-1.amazonaws.com.cn/kubernetes-release/release/$(KUBERNETES_VERSION)"
.PHONY: create-cluster create-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops create cluster \ --cloud=aws \ --name=$(CLUSTER_NAME) \ --image=$(AMI) \ --zones=$(ZONES) \ --master-count=$(MASTER_COUNT) \ --master-size=$(MASTER_SIZE) \ --node-count=$(NODE_COUNT) \ --node-size=$(NODE_SIZE) \ --vpc=$(VPCID) \ --kubernetes-version=$(KUBERNETES_VERSION_URI) \ --networking=$(NETWORKING) \ --ssh-public-key=$(SSH_PUBLIC_KEY)
.PHONY: edit-ig-nodes edit-ig-nodes: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops edit ig --name=$(CLUSTER_NAME) nodes
.PHONY: edit-cluster edit-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops edit cluster $(CLUSTER_NAME)
.PHONY: update-cluster update-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops update cluster $(CLUSTER_NAME) --yes
.PHONY: validate-cluster validate-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops validate cluster
.PHONY: delete-cluster delete-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops delete cluster --name $(CLUSTER_NAME) --yes
.PHONY: rolling-update-cluster rolling-update-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops rolling-update cluster --name $(CLUSTER_NAME) --yes --cloudonly
.PHONY: get-cluster get-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops get cluster --name $(CLUSTER_NAME)
I noticed you use an old AMI ami-0caaf17a3032c1b56
.
And it's different from the AMI provided in the Makefile
https://github.com/nwcdlabs/kops-cn/blob/3a58a801caf089c6001f562c2ea71f5bc6882d9b/Makefile#L25
However, this might not be the root cause.
ok, may be i did not notice the ami is changed in the Makefile
@l0vest0rm if you still have the same issue, try to reach out to your AWS SA in China or email to nwcd_labs@nwcdcloud.cn for private discussion.
@l0vest0rm Have you solved this problem yet? because I have the same problem as yours, and only one slave node online, which other slave node can neither ssh from outside nor inside the cluster.
when i create cluster and edit then update cluster follow the instruction with 3 master and 1 node in cn-northwest-1. but the node is NotReady status.
if i change the Makefile set the NETWORKING to flannel-vxlan, it is ok.
i guess because the node have muli private ips, if the primary ip is not the first, the node is NotReady status.