Closed shnnosuke34725 closed 4 days ago
alice 集群,必须存在 bob 和 carol domain bob 集群,必须存在 alice 和 carol domain carol 集群,必须存在 alice 和 bob domain
kubectl get cdr查看了路由配置,alice-carol、carol-alice的Ready下是空的,是需要让这里为True是吗
我看到部署教程中写到要建立授权需要两个节点能直连,但是如果部署alice-bob-carol节点转发的话alice和carol是不需要直连的,请问应该如何建立授权
我看到部署教程中写到要建立授权需要两个节点能直连,但是如果部署alice-bob-carol节点转发的话alice和carol是不需要直连的,请问应该如何建立授权
可以参考https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.11.0b0/deployment/Docker_deployment_kuscia/deploy_p2p_cn#id4 添加一下domain,安装上方@gshilei说的
您好,不好意思再请教一下,我配置好cdr后运行任务显示status: approveStatus: alice: JobAccepted conditions:
annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kuscia.secretflow/v1alpha1","kind":"KusciaJob","metadata":{"annotations":{},"name":"job-best-effort-linear","namespace":"cross-domain"},"spec":{"initiator":"alice","maxParallelism":2,"scheduleMode":"BestEffort","tasks":[{"alias":"job-psi","appImage":"secretflow-image","parties":[{"domainID":"alice"},{"domainID":"carol"}],"priority":100,"taskID":"job-psi","taskInputConfig":"{\"sf_datasource_config\":{\"alice\":{\"id\":\"default-data-source\"},\"carol\":{\"id\":\"default-data-source\"}},\"sf_cluster_desc\":{\"parties\":[\"alice\",\"carol\"],\"devices\":[{\"name\":\"spu\",\"type\":\"spu\",\"parties\":[\"alice\",\"carol\"],\"config\":\"{\\"runtime_config\\":{\\"protocol\\":\\"REF2K\\",\\"field\\":\\"FM64\\"},\\"link_desc\\":{\\"connect_retry_times\\":60,\\"connect_retry_interval_ms\\":1000,\\"brpc_channel_protocol\\":\\"http\\",\\"brpc_channel_connection_type\\":\\"pooled\\",\\"recv_timeout_ms\\":1200000,\\"http_timeout_ms\\":1200000}}\"},{\"name\":\"heu\",\"type\":\"heu\",\"parties\":[\"alice\",\"carol\"],\"config\":\"{\\"mode\\": \\"PHEU\\", \\"schema\\": \\"paillier\\", \\"key_size\\": 2048}\"}],\"ray_fed_config\":{\"cross_silo_comm_backend\":\"brpc_link\"}},\"sf_node_eval_param\":{\"domain\":\"data_prep\",\"name\":\"psi\",\"version\":\"0.0.5\",\"attr_paths\":[\"protocol\",\"sort_result\",\"allow_duplicate_keys\",\"allow_duplicate_keys/yes/join_type\",\"allow_duplicate_keys/yes/join_type/left_join/left_side\",\"input/receiver_input/key\",\"input/sender_input/key\"],\"attrs\":[{\"s\":\"PROTOCOL_ECDH\"},{\"b\":true},{\"s\":\"yes\"},{\"s\":\"left_join\"},{\"ss\":[\"alice\"]},{\"ss\":[\"id1\"]},{\"ss\":[\"id2\"]}]},\"sf_input_ids\":[\"alice-table\",\"carol-table\"],\"sf_output_ids\":[\"psi-output\"],\"sf_output_uris\":[\"psi-output.csv\"]}"},{"alias":"job-split","appImage":"secretflow-image","dependencies":["job-psi"],"parties":[{"domainID":"alice"},{"domainID":"carol"}],"priority":100,"taskID":"job-split","taskInputConfig":"{\"sf_datasource_config\":{\"alice\":{\"id\":\"default-data-source\"},\"carol\":{\"id\":\"default-data-source\"}},\"sf_cluster_desc\":{\"parties\":[\"alice\",\"carol\"],\"devices\":[{\"name\":\"spu\",\"type\":\"spu\",\"parties\":[\"alice\",\"carol\"],\"config\":\"{\\"runtime_config\\":{\\"protocol\\":\\"REF2K\\",\\"field\\":\\"FM64\\"},\\"link_desc\\":{\\"connect_retry_times\\":60,\\"connect_retry_interval_ms\\":1000,\\"brpc_channel_protocol\\":\\"http\\",\\"brpc_channel_connection_type\\":\\"pooled\\",\\"recv_timeout_ms\\":1200000,\\"http_timeout_ms\\":1200000}}\"},{\"name\":\"heu\",\"type\":\"heu\",\"parties\":[\"alice\",\"carol\"],\"config\":\"{\\"mode\\": \\"PHEU\\", \\"schema\\": \\"paillier\\", \\"key_size\\": 2048}\"}],\"ray_fed_config\":{\"cross_silo_comm_backend\":\"brpc_link\"}},\"sf_node_eval_param\":{\"domain\":\"data_prep\",\"name\":\"train_test_split\",\"version\":\"0.0.1\",\"attr_paths\":[\"train_size\",\"test_size\",\"random_state\",\"shuffle\"],\"attrs\":[{\"f\":0.75},{\"f\":0.25},{\"i64\":1234},{\"b\":true}]},\"sf_output_uris\":[\"train-dataset.csv\",\"test-dataset.csv\"],\"sf_output_ids\":[\"train-dataset\",\"test-dataset\"],\"sf_input_ids\":[\"psi-output\"]}"}]}} kuscia.secretflow/initiator: alice kuscia.secretflow/interconn-kuscia-parties: carol kuscia.secretflow/interconn-self-parties: alice kuscia.secretflow/self-cluster-as-initiator: "true" creationTimestamp: "2024-09-10T09:15:01Z" generation: 1 name: job-best-effort-linear namespace: cross-domain resourceVersion: "1083332" uid: 5fd9fdbe-2e3f-4e67-81ec-eac49c8e58ab
status: approveStatus: alice: JobAccepted conditions:
NAME SOURCE DESTINATION HOST AUTHENTICATION READY carol-alice carol alice 192.168.123.89 Token True alice-carol alice carol 192.168.123.198 Token True alice-bob alice bob 192.168.123.93 Token True bob-alice bob alice Token True
NAME SOURCE DESTINATION HOST AUTHENTICATION READY bob-carol bob carol Token True carol-bob carol bob 192.168.123.93 Token True alice-carol alice carol 192.168.123.198 Token True carol-alice carol alice 192.168.123.89 Token True
NAME SOURCE DESTINATION HOST AUTHENTICATION READY bob-carol bob carol Token True carol-bob carol bob 192.168.123.93 Token True alice-carol alice carol 192.168.123.198 Token True carol-alice carol alice 192.168.123.89 Token True
您好,不好意思再请教一下,我配置好cdr后运行任务显示status: approveStatus: alice: JobAccepted conditions:
- lastTransitionTime: "2024-09-10T08:38:47Z" status: "True" type: JobValidated lastReconcileTime: "2024-09-10T08:38:47Z" phase: AwaitingApproval stageStatus: alice: JobCreateStageSucceeded startTime: "2024-09-10T08:38:47Z" 请问这是什么问题,经过检查domaindatagrant都配置好了
job metadata 中 annotion
annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kuscia.secretflow/v1alpha1","kind":"KusciaJob","metadata":{"annotations":{},"name":"job-best-effort-linear","namespace":"cross-domain"},"spec":{"initiator":"alice","maxParallelism":2,"scheduleMode":"BestEffort","tasks":[{"alias":"job-psi","appImage":"secretflow-image","parties":[{"domainID":"alice"},{"domainID":"carol"}],"priority":100,"taskID":"job-psi","taskInputConfig":"{"sf_datasource_config":{"alice":{"id":"default-data-source"},"carol":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice","carol"],"devices":[{"name":"spu","type":"spu","parties":["alice","carol"],"config":"{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"},{"name":"heu","type":"heu","parties":["alice","carol"],"config":"{\"mode\": \"PHEU\", \"schema\": \"paillier\", \"key_size\": 2048}"}],"ray_fed_config":{"cross_silo_comm_backend":"brpc_link"}},"sf_node_eval_param":{"domain":"data_prep","name":"psi","version":"0.0.5","attr_paths":["protocol","sort_result","allow_duplicate_keys","allow_duplicate_keys/yes/join_type","allow_duplicate_keys/yes/join_type/left_join/left_side","input/receiver_input/key","input/sender_input/key"],"attrs":[{"s":"PROTOCOL_ECDH"},{"b":true},{"s":"yes"},{"s":"left_join"},{"ss":["alice"]},{"ss":["id1"]},{"ss":["id2"]}]},"sf_input_ids":["alice-table","carol-table"],"sf_output_ids":["psi-output"],"sf_output_uris":["psi-output.csv"]}"},{"alias":"job-split","appImage":"secretflow-image","dependencies":["job-psi"],"parties":[{"domainID":"alice"},{"domainID":"carol"}],"priority":100,"taskID":"job-split","taskInputConfig":"{"sf_datasource_config":{"alice":{"id":"default-data-source"},"carol":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice","carol"],"devices":[{"name":"spu","type":"spu","parties":["alice","carol"],"config":"{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"},{"name":"heu","type":"heu","parties":["alice","carol"],"config":"{\"mode\": \"PHEU\", \"schema\": \"paillier\", \"key_size\": 2048}"}],"ray_fed_config":{"cross_silo_comm_backend":"brpc_link"}},"sf_node_eval_param":{"domain":"data_prep","name":"train_test_split","version":"0.0.1","attr_paths":["train_size","test_size","random_state","shuffle"],"attrs":[{"f":0.75},{"f":0.25},{"i64":1234},{"b":true}]},"sf_output_uris":["train-dataset.csv","test-dataset.csv"],"sf_output_ids":["train-dataset","test-dataset"],"sf_input_ids":["psi-output"]}"}]}} kuscia.secretflow/initiator: alice kuscia.secretflow/interconn-kuscia-parties: carol kuscia.secretflow/interconn-self-parties: alice kuscia.secretflow/self-cluster-as-initiator: "true" creationTimestamp: "2024-09-10T09:15:01Z" generation: 1 name: job-best-effort-linear namespace: cross-domain resourceVersion: "1083332" uid: 5fd9fdbe-2e3f-4e67-81ec-eac49c8e58ab
job metadata 中 status
status: approveStatus: alice: JobAccepted conditions: - lastTransitionTime: "2024-09-10T09:15:01Z" status: "True" type: JobValidated lastReconcileTime: "2024-09-10T09:15:01Z" phase: AwaitingApproval stageStatus: alice: JobCreateStageSucceeded startTime: "2024-09-10T09:15:01Z"
alice容器中kubectl get cdr
NAME SOURCE DESTINATION HOST AUTHENTICATION READY carol-alice carol alice 192.168.123.89 Token True alice-carol alice carol 192.168.123.198 Token True alice-bob alice bob 192.168.123.93 Token True bob-alice bob alice Token True
bob容器中kubectl get cdr
NAME SOURCE DESTINATION HOST AUTHENTICATION READY bob-carol bob carol Token True carol-bob carol bob 192.168.123.93 Token True alice-carol alice carol 192.168.123.198 Token True carol-alice carol alice 192.168.123.89 Token True
carol容器中kubectl get cdr
NAME SOURCE DESTINATION HOST AUTHENTICATION READY bob-carol bob carol Token True carol-bob carol bob 192.168.123.93 Token True alice-carol alice carol 192.168.123.198 Token True carol-alice carol alice 192.168.123.89 Token True
kubectl get pod -A显示No resources found
从下面看,这个 job 参与方只有2个,alice 和 carol。alice 这边已经审批通过了,没有收到 carol 方的审批通过状态。可以在 carol 节点中查看是否有该 job,如果有该job,那么在 kuscia.log 中查询下 jobID,看看是否有什么报错信息。
我查了下 carol 节点中没有该 job
1.kubectl get interop结果: NAME AGE carol-2-bob 5d3h carol-2-alice 3h13m 2.以下是最后几行的报错不知道是不是这个: .26.11/tools/cache/reflector.go:169: failed to list v1alpha1.KusciaDeployment: the server has asked for the client to provide credentials (get kusciadeployments.kuscia.secretflow) 2024-09-10 19:40:02.070 INFO nlog/nlog.go:77 W0910 19:40:02.070399 827 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: failed to list v1alpha1.KusciaDeployment: the server has asked for the client to provide credentials (get kusciadeployments.kuscia.secretflow) 2024-09-10 19:40:02.070 INFO nlog/nlog.go:77 E0910 19:40:02.070648 827 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: Failed to watch v1alpha1.KusciaDeployment: failed to list v1alpha1.KusciaDeployment: the server has asked for the client to provide credentials (get kusciadeployments.kuscia.secretflow) 2024-09-10 19:40:02.070 INFO nlog/nlog.go:77 E0910 19:40:02.070648 827 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: Failed to watch v1alpha1.KusciaDeployment: failed to list v1alpha1.KusciaDeployment: the server has asked for the client to provide credentials (get kusciadeployments.kuscia.secretflow) 2024-09-10 19:40:02.070 INFO nlog/nlog.go:77 E0910 19:40:02.070648 827 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: Failed to watch v1alpha1.KusciaDeployment: failed to list v1alpha1.KusciaDeployment: the server has asked for the client to provide credentials (get kusciadeployments.kuscia.secretflow)
carol-alice的cdr(不知道是不是指这个): apiVersion: kuscia.secretflow/v1alpha1 kind: ClusterDomainRoute metadata: name: carol-alice spec: authenticationType: Token source: carol destination: alice endpoint: host: 192.168.123.89 ports:
apiVersion: kuscia.secretflow/v1alpha1 kind: Domain metadata: annotations: domain/carol: kuscia.secretflow/domain-type=embedded kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kuscia.secretflow/v1alpha1","kind":"Domain","metadata":{"annotations":{"domain/carol":"kuscia.secretflow/domain-type=embedded"},"name":"carol"},"spec":{"authCenter":{"authenticationType":"Token","tokenGenMethod":"RSA-GEN"},"cert":"LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQVRBTkJna3Foa2lHOXcwQkFRc0ZBREFRTVE0d0RBWURWUVFERXdWallYSnYKYkRBZ0Z3MHlOREE1TURNd056VTNNRGxhR0E4eU1EYzBNRGt3TXpBM05UY3dPVm93RURFT01Bd0dBMVVFQXhNRgpZMkZ5YjJ3d2dnRWlNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUJBUURpV2dXckgyM0hQaEVWCllRWVNaSnFXYWM0aXBOSW4xbkoxdk9VWnk5NEhreWlFRElUdXVaZG8rQlpsS1Q5bmVQM3c4QTVPQ2lmQWMzN3oKeDVJZkhGQVhvUWk4U2orR0hiSTVFbE1RNFhvMjF3Ulh1TDRFc0hHaG8yckNFQ2d6OGhVTzB0YUlhWm1SNVBOMQpHNXRPeExKY1FNeVptTnEzNXVqRXpGNkRhU2tmdU9JdjUvTTgyT3M2WlFyR1pZT3ZrUDF5aWExVHZVQ0pLTFJiClc3a1N6TXVtSStwMnRLUzc3eUJENVR4cjdCdWEwR0VzZWlCdDBpYVZwU1pOT3VFMTI0UzAzWW5RbUl6Sm5CMWgKbDVBWnNpdkFRTWNRaXluT3hpdnBhL3FuOHFRQnB6ajZIdkQ1RUxCSkdya3NVVmszMVJ2RDlpay9tMTZVVUtObAphL2FjVE1jRkFnTUJBQUdqWVRCZk1BNEdBMVVkRHdFQi93UUVBd0lDaERBZEJnTlZIU1VFRmpBVUJnZ3JCZ0VGCkJRY0RBZ1lJS3dZQkJRVUhBd0V3RHdZRFZSMFRBUUgvQkFVd0F3RUIvekFkQmdOVkhRNEVGZ1FVZTlWMWR3aEcKcEtsbW5vbUtLSTZvVTZMMVlmQXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBRE93TW0zcHFDbGFDZ0hiMWxUKwp1WFRsMlNPSUFLQUZVWit4YmpHQ1NDKzR0K1pkNm8vRUJ3SVhTaWJ0ZENXVDRTNnJ6QndFNStNL0MwQXZaSlY4CkRKMyswR1o4Q3pyVTY2NnFQaG5GRUZZZythNWsrc2FCTER3WC80TkE0ZElucDFWUDRzYmRqYnlaemxIbEZ3K1UKaUlSQWxZNWs5ZGNyNUxiM082U1ZlR015Qm4vV2YxRE5BZXBDVG5RQWxoTjFpVjJkRFZnWGVuTHAvdWJPS1ErQwp2K0VwZFEwd2RCZEk5NS9JZHd2c29xTWI4T2kvaXgrZjhzTlRTUzBSZGtQcTFDV1Fiei9SekNzWDNJcXFMNnU2CjVjeFBDWEdsSm9VZUNjSUtUTzdQWE8xazVBY3NOTDlQTEFmWUJuanlVVDJUbG9ydlNrcnZmenZwblloN25sVkoKRk9BPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==","interConnProtocols":["kuscia"],"master":"carol","role":"partner"}} creationTimestamp: "2024-09-10T07:48:00Z" generation: 1 labels: kuscia.secretflow/domain-auth: completed name: carol resourceVersion: "1075160" uid: 96725c48-1a2a-46bd-92e9-2659913f7660 spec: authCenter: authenticationType: Token tokenGenMethod: RSA-GEN cert: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQVRBTkJna3Foa2lHOXcwQkFRc0ZBREFRTVE0d0RBWURWUVFERXdWallYSnYKYkRBZ0Z3MHlOREE1TURNd056VTNNRGxhR0E4eU1EYzBNRGt3TXpBM05UY3dPVm93RURFT01Bd0dBMVVFQXhNRgpZMkZ5YjJ3d2dnRWlNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUJBUURpV2dXckgyM0hQaEVWCllRWVNaSnFXYWM0aXBOSW4xbkoxdk9VWnk5NEhreWlFRElUdXVaZG8rQlpsS1Q5bmVQM3c4QTVPQ2lmQWMzN3oKeDVJZkhGQVhvUWk4U2orR0hiSTVFbE1RNFhvMjF3Ulh1TDRFc0hHaG8yckNFQ2d6OGhVTzB0YUlhWm1SNVBOMQpHNXRPeExKY1FNeVptTnEzNXVqRXpGNkRhU2tmdU9JdjUvTTgyT3M2WlFyR1pZT3ZrUDF5aWExVHZVQ0pLTFJiClc3a1N6TXVtSStwMnRLUzc3eUJENVR4cjdCdWEwR0VzZWlCdDBpYVZwU1pOT3VFMTI0UzAzWW5RbUl6Sm5CMWgKbDVBWnNpdkFRTWNRaXluT3hpdnBhL3FuOHFRQnB6ajZIdkQ1RUxCSkdya3NVVmszMVJ2RDlpay9tMTZVVUtObAphL2FjVE1jRkFnTUJBQUdqWVRCZk1BNEdBMVVkRHdFQi93UUVBd0lDaERBZEJnTlZIU1VFRmpBVUJnZ3JCZ0VGCkJRY0RBZ1lJS3dZQkJRVUhBd0V3RHdZRFZSMFRBUUgvQkFVd0F3RUIvekFkQmdOVkhRNEVGZ1FVZTlWMWR3aEcKcEtsbW5vbUtLSTZvVTZMMVlmQXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBRE93TW0zcHFDbGFDZ0hiMWxUKwp1WFRsMlNPSUFLQUZVWit4YmpHQ1NDKzR0K1pkNm8vRUJ3SVhTaWJ0ZENXVDRTNnJ6QndFNStNL0MwQXZaSlY4CkRKMyswR1o4Q3pyVTY2NnFQaG5GRUZZZythNWsrc2FCTER3WC80TkE0ZElucDFWUDRzYmRqYnlaemxIbEZ3K1UKaUlSQWxZNWs5ZGNyNUxiM082U1ZlR015Qm4vV2YxRE5BZXBDVG5RQWxoTjFpVjJkRFZnWGVuTHAvdWJPS1ErQwp2K0VwZFEwd2RCZEk5NS9JZHd2c29xTWI4T2kvaXgrZjhzTlRTUzBSZGtQcTFDV1Fiei9SekNzWDNJcXFMNnU2CjVjeFBDWEdsSm9VZUNjSUtUTzdQWE8xazVBY3NOTDlQTEFmWUJuanlVVDJUbG9ydlNrcnZmenZwblloN25sVkoKRk9BPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg== interConnProtocols:
应该是上面 cdr 创建的有问题,后续有相关同学会继续跟进
好的麻烦了,非常感谢!
这个问题可能是触发了转发流程的bug导致token验证失败,确认后我再同步具体情况
这个问题已经复现和确认,对应的fix会跟随后续的版本发布
Stale issue message. Please comment to remove stale tag. Otherwise this issue will be closed soon.
Issue Type
Running
Search for existing issues similar to yours
Yes
OS Platform and Distribution
Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-150-generic x86_64)
Kuscia Version
kuscia v0.9.0b0
Deployment
docker
deployment Version
docker 24.0.5
App Running type
secretflow
App Running version
secretflow-lite-anolis8:1.7.0b0
Configuration file used to run kuscia.
What happend and What you expected to happen.
Kuscia log output.