secretflow / kuscia

Kuscia(Kubernetes-based Secure Collaborative InfrA) is a K8s-based privacy-preserving computing task orchestration framework.
https://www.secretflow.org.cn/docs/kuscia/latest/zh-Hans
Apache License 2.0
70 stars 49 forks source link

k8s点对点安装kuscia,在 Bob 里添加 Alice 的证书等信息,导致alice的pod挂掉 #361

Closed wangzeyu135798 closed 1 week ago

wangzeyu135798 commented 1 month ago

Issue Type

Install/Deploy

Search for existing issues similar to yours

Yes

OS Platform and Distribution

centos7

Kuscia Version

0.8

Deployment

k8s

deployment Version

v1.19.9

App Running type

secretflow

App Running version

1

Configuration file used to run kuscia.

1

What happend and What you expected to happen.

[root@kuscia-autonomy-bob-5665c79cdb-5fkb6 kuscia]# scripts/deploy/add_domain.sh alice p2p
E0703 16:32:06.790544    1028 memcache.go:287] couldn't get resource list for k3s.cattle.io/v1: the server could not find the requested resource
E0703 16:32:06.790545    1028 memcache.go:287] couldn't get resource list for kuscia.secretflow/v1alpha1: the server could not find the requested resource
E0703 16:32:06.790623    1028 memcache.go:287] couldn't get resource list for helm.cattle.io/v1: the server could not find the requested resource
E0703 16:32:06.814569    1028 memcache.go:287] couldn't get resource list for helm.cattle.io/v1: the server could not find the requested resource
E0703 16:32:06.827708    1028 memcache.go:287] couldn't get resource list for kuscia.secretflow/v1alpha1: the server could not find the requested resource
E0703 16:32:06.827764    1028 memcache.go:287] couldn't get resource list for k3s.cattle.io/v1: the server could not find the requested resource
error: resource mapping not found for name: "alice" namespace: "" from "STDIN": no matches for kind "Domain" in version "kuscia.secretflow/v1alpha1"
ensure CRDs are installed first

Kuscia log output.

2024-07-03 16:40:06.119 INFO nlog/nlog.go:77 E0703 16:40:06.119044       6 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: Failed to watch *v1.Pod: the server is currently unable to handle the request (get pods) - error from a previous attempt: read tcp 127.0.0.1:36240->127.0.0.1:6443: read: connection reset by peer
2024-07-03 16:40:06.119 INFO nlog/nlog.go:77 E0703 16:40:06.119044       6 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: Failed to watch *v1.Pod: the server is currently unable to handle the request (get pods) - error from a previous attempt: read tcp 127.0.0.1:36240->127.0.0.1:6443: read: connection reset by peer
2024-07-03 16:40:06.119 INFO nlog/nlog.go:77 E0703 16:40:06.119044       6 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: Failed to watch *v1.Pod: the server is currently unable to handle the request (get pods) - error from a previous attempt: read tcp 127.0.0.1:36240->127.0.0.1:6443: read: connection reset by peer
2024-07-03 16:40:08.190 ERROR controller/gateway.go:165 update gateway(name:kuscia-autonomy-bob-5665c79cdb-5fkb6 namespace:bob) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "kuscia-autonomy-bob-5665c79cdb-5fkb6": the object has been modified; please apply your changes to the latest version and try again
2024-07-03 16:40:08.190 ERROR controller/gateway.go:106 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "kuscia-autonomy-bob-5665c79cdb-5fkb6": the object has been modified; please apply your changes to the latest version and try again
2024-07-03 16:40:11.153 ERROR controller/gateway.go:165 update gateway(name:kuscia-autonomy-bob-5665c79cdb-5fkb6 namespace:bob) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "kuscia-autonomy-bob-5665c79cdb-5fkb6": the object has been modified; please apply your changes to the latest version and try again
2024-07-03 16:40:11.153 ERROR controller/gateway.go:106 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "kuscia-autonomy-bob-5665c79cdb-5fkb6": the object has been modified; please apply your changes to the latest version and try again
aokaokd commented 1 month ago

这个是因为alice和bob 的镜像版本不一致,你可以:

* 确认您的镜像版本为lastest
* 确认您的deployment.yaml中的 imagePullPolicy 为always
* kubectl apply -f deployment.yaml   #更新您的镜像
* 删除旧的pod后进行重试
wangzeyu135798 commented 1 month ago

alice和bob的deployment.yaml的镜像均为 template: spec: containers: image: secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia-secretflow:latest

aokaokd commented 1 month ago

你好,看返回的日志,没找到你的 alice 资源。 如果您是按照官网创建的话,请检查您的 alice 和 bob 对应的 pod 的连通性。 可以使用 curl -kvvv http://kuscia-autonomy-bob.autonomy-bob.svc.cluster.local:1080/ 看下

github-actions[bot] commented 2 weeks ago

Stale issue message. Please comment to remove stale tag. Otherwise this issue will be closed soon.