nwcdheap / kops-cn

AWS中国宁夏区域/北京区域,快速Kops部署K8S集群
Apache License 2.0
121 stars 74 forks source link

node NotReady after create and update #65

Open l0vest0rm opened 5 years ago

l0vest0rm commented 5 years ago

when i create cluster and edit then update cluster follow the instruction with 3 master and 1 node in cn-northwest-1. but the node is NotReady status.

if i change the Makefile set the NETWORKING to flannel-vxlan, it is ok.

i guess because the node have muli private ips, if the primary ip is not the first, the node is NotReady status.

pahud commented 5 years ago

Hi

Thanks for the feedback. Can you provide the content of your complete Makefile and the full command you used to create the cluster?

l0vest0rm commented 5 years ago

commands

make create-cluster make edit-cluster // and add spec.yaml make update-cluster

Makefile

customize the values below

TARGET_REGION ?= cn-northwest-1 AWS_PROFILE ?= default KOPS_STATE_STORE ?= s3://kops.xxxx.com VPCID ?= vpc-xxxxxx MASTER_SIZE ?= c5.xlarge MASTER_COUNT ?= 3 NODE_SIZE ?= c5.2xlarge NODE_COUNT ?= 1 SSH_PUBLIC_KEY ?= ~/key/k8s-dev.pub KUBERNETES_VERSION ?= v1.11.9 KOPS_VERSION ?= 1.11.1 NETWORKING ?= amazon-vpc-routed-eni

do not modify following values

AWS_DEFAULT_REGION ?= $(TARGET_REGION) AWS_REGION ?= $(AWS_DEFAULT_REGION) ifeq ($(TARGET_REGION) ,cn-north-1) CLUSTER_NAME ?= cluster.bjs.k8s.local AMI ?= ami-0caaf17a3032c1b56 ZONES ?= cn-north-1a,cn-north-1b endif

ifeq ($(TARGET_REGION) ,cn-northwest-1) CLUSTER_NAME ?= cluster.zhy.k8s.local AMI ?= ami-0a863f3b0a0720e6a ZONES ?= cn-northwest-1a,cn-northwest-1b,cn-northwest-1c endif

KUBERNETES_VERSION_URI ?= "https://s3.cn-north-1.amazonaws.com.cn/kubernetes-release/release/$(KUBERNETES_VERSION)"

.PHONY: create-cluster create-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops create cluster \ --cloud=aws \ --name=$(CLUSTER_NAME) \ --image=$(AMI) \ --zones=$(ZONES) \ --master-count=$(MASTER_COUNT) \ --master-size=$(MASTER_SIZE) \ --node-count=$(NODE_COUNT) \ --node-size=$(NODE_SIZE) \ --vpc=$(VPCID) \ --kubernetes-version=$(KUBERNETES_VERSION_URI) \ --networking=$(NETWORKING) \ --ssh-public-key=$(SSH_PUBLIC_KEY)

.PHONY: edit-ig-nodes edit-ig-nodes: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops edit ig --name=$(CLUSTER_NAME) nodes

.PHONY: edit-cluster edit-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops edit cluster $(CLUSTER_NAME)

.PHONY: update-cluster update-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops update cluster $(CLUSTER_NAME) --yes

.PHONY: validate-cluster validate-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops validate cluster

.PHONY: delete-cluster delete-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops delete cluster --name $(CLUSTER_NAME) --yes

.PHONY: rolling-update-cluster rolling-update-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops rolling-update cluster --name $(CLUSTER_NAME) --yes --cloudonly

.PHONY: get-cluster get-cluster: @KOPS_STATE_STORE=$(KOPS_STATE_STORE) \ AWS_PROFILE=$(AWS_PROFILE) \ AWS_REGION=$(AWS_REGION) \ AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \ kops get cluster --name $(CLUSTER_NAME)

pahud commented 5 years ago

I noticed you use an old AMI ami-0caaf17a3032c1b56.

And it's different from the AMI provided in the Makefile

https://github.com/nwcdlabs/kops-cn/blob/3a58a801caf089c6001f562c2ea71f5bc6882d9b/Makefile#L25

However, this might not be the root cause.

l0vest0rm commented 5 years ago

ok, may be i did not notice the ami is changed in the Makefile

pahud commented 5 years ago

@l0vest0rm if you still have the same issue, try to reach out to your AWS SA in China or email to nwcd_labs@nwcdcloud.cn for private discussion.

YuTingLiu commented 5 years ago

@l0vest0rm Have you solved this problem yet? because I have the same problem as yours, and only one slave node online, which other slave node can neither ssh from outside nor inside the cluster.