secretflow / kuscia

Kuscia(Kubernetes-based Secure Collaborative InfrA) is a K8s-based privacy-preserving computing task orchestration framework.
https://www.secretflow.org.cn/docs/kuscia/latest/zh-Hans
Apache License 2.0
73 stars 55 forks source link

http协议下的明文任务执行 #450

Open T-ze-yu opened 2 weeks ago

T-ze-yu commented 2 weeks ago

Issue Type

Running

Search for existing issues similar to yours

Yes

OS Platform and Distribution

Ubuntu 20.04.6 LTS

Kuscia Version

kuscia 0.9.0b0

Deployment

docker

deployment Version

Docker version 24.0.5

App Running type

secretflow

App Running version

secretflow 1.7.0b0

Configuration file used to run kuscia.

mode: autonomy
domainID: p207
domainKeyData: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb2dJQkFBS0NBUUVBNDQxeGlrVXUzTGlkc1RFSmdqLzhBZGFrMEJhT1lXaWNPdmp1Q05BV2VlMTNiRGpICjI2NERRdlAzcVROOGJxQlJrRXl4VzNnOWtSS3FrcXlmbHByT01LRmJVMnROMWNyVlR3TU9yaW1pMnp1MlpvVWYKSnZtbGVTbkdwREpYMWVQQXJZRU1TOG5FbHA2bnRGTkhsYnVSWUJTeXMvaHpPLzBRZU03dHcyZnlQSVNDMHVvVAp0WDBHQlRLd3VUbC9Tek9TV1ZOOWhDZGNQekJRV2lPZXJGRURqRVBEWERaUUp4bkViNWlFTmlQdlhWQlZYcFlSCkxZVlRKbTRDMTlNMVcyQXQxNVFkTTcrcHJnZ3hRV0NRaGlKaVpSYXFMNWxnSkZCTXVsUG5qQ3lOR0tXa0s0R2cKY242R2lEbVhrQXl2aG1tamRWL2pTVXN6VDh3ZlRERXEyOVBKM1FJREFRQUJBb0lCQUhlUEg3TG91c0NaOTdEYgo5UVVNblZwUjd2S3VoMHpDN0NOSUc3bGhyQTJRS1lraEpGRldVcnhnOXlWZHVlbGVMcnpFcndOQ1lBYlRhZS82CjV6YjRTNUhkbGVCMHBzYkg5ZCszMllURXQ1Njg5dzlTcnBXSjRkbVJpNTlHSEVSemtBOFptTjVST3d0d2ZPa1EKUUxKNWhONll3WFF4L0VudW96TDJkcEtQRVFXZUZEd05jeTdNbmpHZWgyOUJ5RVVnTkIvZENmRlcwY3hSNzlhVwo5RkFxVy80MWZEM3k4c3VVbGlBR0YxcS9yWlU5ayt6UkhFYlhRTVVBQkVCeUp4cU10RHNuUXo4VmNKWTY0TW9NCm1SQUthK3hiU0FoaVdlSm1UUEhhb284WWYvbUdVeGpqQTJ3ZWR5UHA3SnpDcHhsV1ZEaHNVWElIMktmSmVnNm4Kc0hFSG9BRUNnWUVBNWgrbngza1diTnVGU1lEUngzSFFHQUhmOU1tN2psOWd3RzhleDFGekpwM0IxU3FWcVk3aApqY2JiS1lTTHVBU3BSN3RHVUdzNEdoZUd1aEtucXhRNlhXeFBHVHpkbjdrVzRZMExNUVc0L1QyUTN4d2N3ZGpZCmJta0N0VnNiSm81VFNZYVk3dTRTTWNtVUdwakFUMTRMeGR2UlhjaGVTbFVFSHR1VWRiSVY3ZDBDZ1lFQS9TUEcKZFptRzVydVU2OE93YXBVOWhkbUlJUTc5am9uaWVYSndxaWpmT3ROSlJxOENXWDZxRVZ6elhQb3hQdWhsektDcwpDWEJyNHIzY2hrOUZSRlFzcGVqWG53QktPb1dBMVo5eWN0ZHFDT2F6R0taNnAxUmZsNWx6THY4d2U5VWo4WGR3ClMvQkUzOGc2b3FzMFFsU2JOQ0hJWUxyY1JEdkpvM0t5R0pZOWpBRUNnWUFIcnM3ZkxmKzlxcWFNaWF4M1NDbDIKWTdtaVpvbklldzZ6M2dIZERhOFdmdlhWdEJKREV1NGMyYUsvaEJsV0QzSEhYMDA5cWhhNWFFZXJOcXc2WGZhRQozL1RVRnVBZlVRS2VqU0x1aEE1bEJnVXNMYmdZRUxGSkhtQmt4YUhtYTZJRU5tWXNzKzRQazNkS1hBY3ZueWd0CmR1VktpRUg5b1ZEOTVyN1NIeHYwVVFLQmdCa3BIeWE5TmMxbFE2NFRhMHVNdmVxNTdtL3F2NFVWYTI5SzBxdjMKR0FrT3l5KzlZV3huekp1aE00ZEFUdmpEdktxVUpjVmlhVGJHVEU4RlBndEdtcEY3RFVOK2tlSXpOdFVFM2lsUQpBL2dTaGlhakZYbmdSd2dZZG54clhQUlNBUnFWRnBKVnRXTFEwaE10RlNxcW9pcVNXUXBVU0dSMzFOanNJNHVTCkUxZ0JBb0dBZlcrTlpweENpL082Q0U2bmcxNFZYejd4d29Ra2VQaFNrTk1TcGoxbU1Nb3hLbzdBa2gvZW9zY0oKZ0RwaG5mTllTYjc4U1NJcmxJOUM2QmkzMGJ2enFQVmpwRjg0RjNNdXFiZXEzN0pERVIrOUVVbXFxMUdhOHBMMAp2SGFPN3VaNTVCRXFCanBXc1dCcGpTVC9WdGcwcTFkS1h3TXZhSVZyTGtidnJpSkNCTVk9Ci0tLS0tRU5EIFJTQSBQUklWQVRFIEtFWS0tLS0tCg==
protocol: NOTLS
logLevel: INFO
runtime: runc
runk:
  namespace: ""
  dnsServers: []
  kubeconfigFile: ""
capacity:
  cpu: ""
  memory: ""
  pods: ""
  storage: ""
reservedResources:
  cpu: ""
  memory: ""
image:
  pullPolicy: ""
  defaultRegistry: ""
  registries: []
datastoreEndpoint: ""
enableWorkloadApprove: false

What happend and What you expected to happen.

在创建数据以及相互授权时,使用http协议能够正常执行,但在运行求交任务时遇到了,job状态AwaitingApproval导致没有运行成功。
{'status': {'code': 0, 'message': 'success', 'details': []}, 'data': {'jobs': [{'job_id': 'iotgenb8i17a572k', 'status': {'state': 'AwaitingApproval', 'err_msg': '', 'create_time': '2024-11-06T06:44:29Z', 'start_time': '2024-11-06T06:44:29Z', 'end_time': '', 'tasks': [{'task_id': 'cbw66pdv47mmkvkk', 'state': 'Pending', 'err_msg': '', 'create_time': '', 'start_time': '', 'end_time': '', 'parties': [], 'alias': 'intersection'}], 'stage_status_list': [{'domain_id': 'p207', 'state': 'JobCreateStageSucceeded'}], 'approve_status_list': [{'domain_id': 'p207', 'state': 'JobAccepted'}]}}]}}

Kuscia log output.

p207:
2024-11-06 16:23:40.770 INFO controller/domain_route.go:432 add cluster p207-to-p208 name:http protocol:HTTP port:11080
2024-11-06 16:23:40.770 INFO xds/cluster_config.go:131 Generate tls config for p207-to-p208-http
2024-11-06 16:23:40.770 INFO xds/xds.go:439 Add cluster:p207-to-p208-http
2024-11-06 16:23:40.770 INFO xds/xds.go:439 Add cluster:p207-to-p208-http
2024-11-06 16:23:40.770 INFO controller/domain_route.go:293 DomainRoute p207/p207-p208 starts handshake, the last revision is 0
2024-11-06 16:23:45.787 ERROR controller/handshake.go:307 DomainRoute p207-p208: handshake fail:response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:23:45.787 ERROR controller/domain_route.go:297 response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:23:45.787 INFO queue/queue.go:109 Re-syncing: queue id[domain-route-queue], retry:[13] key[p207/p207-p208]: "response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL", re-queuing (5.017405177s)
2024-11-06 16:24:15.026 WARN domainroute/check.go:46 Domainroute p207/p208-p207 checkEffectiveInstances failed: tokens is nil, please check the result of handshake in instance's log
2024-11-06 16:24:15.026 INFO queue/queue.go:176 Finish processing item: queue id[domain-route-controller], key[p207/p208-p207] (99.107µs)
2024-11-06 16:24:15.026 WARN domainroute/check.go:138 Domainroute p207/p207-p208 token is waiting more than 2 minutes for ready, so need to re-handshake
2024-11-06 16:24:15.032 INFO domainroute/rolling.go:47 PreRollingDomainRoute p207/p207-p208, new revision 0
2024-11-06 16:24:15.032 INFO queue/queue.go:176 Finish processing item: queue id[domain-route-controller], key[p207/p207-p208] (5.818168ms)
2024-11-06 16:24:15.032 INFO queue/queue.go:176 Finish processing item: queue id[domain-route-controller], key[p207/p207-p208] (18.185µs)
2024-11-06 16:24:15.032 INFO controller/domain_route.go:432 add cluster p207-to-p208 name:http protocol:HTTP port:11080
2024-11-06 16:24:15.032 INFO xds/cluster_config.go:131 Generate tls config for p207-to-p208-http
2024-11-06 16:24:15.032 INFO xds/xds.go:439 Add cluster:p207-to-p208-http
2024-11-06 16:24:15.032 INFO xds/xds.go:439 Add cluster:p207-to-p208-http
2024-11-06 16:24:15.032 INFO controller/domain_route.go:293 DomainRoute p207/p207-p208 starts handshake, the last revision is 0
2024-11-06 16:24:15.037 INFO clusterdomainroute/domainroute.go:143 ClusterDomainRoute p207-p208 update status
2024-11-06 16:24:15.038 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p207/p207-p208] (5.443525ms)
2024-11-06 16:24:15.044 INFO clusterdomainroute/domainroute.go:143 ClusterDomainRoute p207-p208 update status
2024-11-06 16:24:15.044 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p207-p208] (5.52646ms)
2024-11-06 16:24:15.049 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p208/domaindatagrant-bd6efcb0be0c44bc0c2b137243f81162] (20.025µs)
2024-11-06 16:24:15.049 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p207/69d7e6049c0a11efbfd4ecd68aece617] (35.881µs)
2024-11-06 16:24:15.049 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p208/69d7e6049c0a11efbfd4ecd68aece617] (5.998µs)
2024-11-06 16:24:15.052 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p207/domaindatagrant-bd6efcb0be0c44bc0c2b137243f81162] (3.241593ms)
2024-11-06 16:24:15.088 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p208-p207] (137.577µs)
2024-11-06 16:24:15.093 INFO clusterdomainroute/domainroute.go:143 ClusterDomainRoute p207-p208 update status
2024-11-06 16:24:15.094 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p207-p208] (5.953777ms)
2024-11-06 16:24:20.047 ERROR controller/handshake.go:307 DomainRoute p207-p208: handshake fail:response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:24:20.047 ERROR controller/domain_route.go:297 response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:24:20.047 INFO queue/queue.go:109 Re-syncing: queue id[domain-route-queue], retry:[14] key[p207/p207-p208]: "response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL", re-queuing (5.014657165s)
2024-11-06 16:24:26.748 INFO controller/domain_route.go:432 add cluster p207-to-p208 name:http protocol:HTTP port:11080
2024-11-06 16:24:26.748 INFO xds/cluster_config.go:131 Generate tls config for p207-to-p208-http
2024-11-06 16:24:26.748 INFO xds/xds.go:439 Add cluster:p207-to-p208-http
2024-11-06 16:24:26.748 INFO xds/xds.go:439 Add cluster:p207-to-p208-http
2024-11-06 16:24:26.748 INFO controller/domain_route.go:293 DomainRoute p207/p207-p208 starts handshake, the last revision is 0
2024-11-06 16:24:31.766 ERROR controller/handshake.go:307 DomainRoute p207-p208: handshake fail:response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:24:31.766 ERROR controller/domain_route.go:297 response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:24:31.766 INFO queue/queue.go:109 Re-syncing: queue id[domain-route-queue], retry:[15] key[p207/p207-p208]: "response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL", re-queuing (5.018181787s)
2024-11-06 16:24:48.442 INFO resources/kusciajob.go:91 update kuscia job eyeow7jijcee93e1
2024-11-06 16:24:48.448 INFO queue/queue.go:124 Finish processing item: queue id[kuscia-job-controller], key[cross-domain/eyeow7jijcee93e1] (5.559072ms)
2024-11-06 16:24:48.448 INFO resources/kusciajob.go:121 Start updating kuscia job "eyeow7jijcee93e1" status
2024-11-06 16:24:48.453 INFO resources/kusciajob.go:125 Finish updating kuscia job "eyeow7jijcee93e1" status
2024-11-06 16:24:48.453 INFO kusciajob/controller.go:304 Finished syncing KusciaJob "cross-domain/eyeow7jijcee93e1" (5.372698ms)
2024-11-06 16:24:48.453 INFO queue/queue.go:124 Finish processing item: queue id[kuscia-job-controller], key[cross-domain/eyeow7jijcee93e1] (5.396255ms)
2024-11-06 16:24:48.454 INFO queue/queue.go:124 Finish processing item: queue id[kuscia-job-controller], key[cross-domain/eyeow7jijcee93e1] (24.101µs)
2024-11-06 16:24:48.457 INFO queue/queue.go:124 Finish processing item: queue id[kuscia-job-controller], key[p208/eyeow7jijcee93e1] (29.582µs)
2024-11-06 16:24:48.460 INFO queue/queue.go:124 Finish processing item: queue id[interconn-kuscia-job-queue], key[cross-domain/eyeow7jijcee93e1] (12.168157ms)
2024-11-06 16:24:48.461 INFO queue/queue.go:124 Finish processing item: queue id[interconn-kuscia-jobsummary-queue], key[p208/eyeow7jijcee93e1] (20.156µs)
2024-11-06 16:24:48.465 INFO queue/queue.go:124 Finish processing item: queue id[interconn-kuscia-job-queue], key[cross-domain/eyeow7jijcee93e1] (4.870945ms)

p208:
2024-11-06 16:23:23.644 INFO queue/queue.go:109 Re-syncing: queue id[domain-route-queue], retry:[13] key[p208/p208-p207]: "response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL", re-queuing (5.017339285s)
2024-11-06 16:23:52.874 WARN domainroute/check.go:46 Domainroute p208/p207-p208 checkEffectiveInstances failed: tokens is nil, please check the result of handshake in instance's log
2024-11-06 16:23:52.874 WARN domainroute/check.go:138 Domainroute p208/p208-p207 token is waiting more than 2 minutes for ready, so need to re-handshake
2024-11-06 16:23:52.874 INFO queue/queue.go:176 Finish processing item: queue id[domain-route-controller], key[p208/p207-p208] (118.391µs)
2024-11-06 16:23:52.881 INFO domainroute/rolling.go:47 PreRollingDomainRoute p208/p208-p207, new revision 0
2024-11-06 16:23:52.881 INFO queue/queue.go:176 Finish processing item: queue id[domain-route-controller], key[p208/p208-p207] (6.966957ms)
2024-11-06 16:23:52.881 INFO queue/queue.go:176 Finish processing item: queue id[domain-route-controller], key[p208/p208-p207] (12.677µs)
2024-11-06 16:23:52.882 INFO controller/domain_route.go:432 add cluster p208-to-p207 name:http protocol:HTTP port:11080
2024-11-06 16:23:52.882 INFO xds/cluster_config.go:131 Generate tls config for p208-to-p207-http
2024-11-06 16:23:52.882 INFO xds/xds.go:439 Add cluster:p208-to-p207-http
2024-11-06 16:23:52.882 INFO xds/xds.go:439 Add cluster:p208-to-p207-http
2024-11-06 16:23:52.882 INFO controller/domain_route.go:293 DomainRoute p208/p208-p207 starts handshake, the last revision is 0
2024-11-06 16:23:52.888 INFO clusterdomainroute/domainroute.go:143 ClusterDomainRoute p208-p207 update status
2024-11-06 16:23:52.888 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p208/p208-p207] (6.944279ms)
2024-11-06 16:23:52.896 INFO clusterdomainroute/domainroute.go:143 ClusterDomainRoute p208-p207 update status
2024-11-06 16:23:52.896 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p208-p207] (6.891297ms)
2024-11-06 16:23:52.900 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p207/domaindatagrant-7f02cc39a349fe0968e2368a58fe1dda] (22.362µs)
2024-11-06 16:23:52.900 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p207/69da9f849c0a11efbb5decd68aece6cb] (8.102µs)
2024-11-06 16:23:52.900 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p208/69da9f849c0a11efbb5decd68aece6cb] (40.424µs)
2024-11-06 16:23:52.904 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p208/domaindatagrant-7f02cc39a349fe0968e2368a58fe1dda] (4.106769ms)
2024-11-06 16:23:52.931 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p207-p208] (205.785µs)
2024-11-06 16:23:52.938 INFO clusterdomainroute/domainroute.go:143 ClusterDomainRoute p208-p207 update status
2024-11-06 16:23:52.939 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p208-p207] (7.231809ms)
2024-11-06 16:23:57.898 ERROR controller/handshake.go:307 DomainRoute p208-p207: handshake fail:response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:23:57.898 ERROR controller/domain_route.go:297 response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:23:57.898 INFO queue/queue.go:109 Re-syncing: queue id[domain-route-queue], retry:[14] key[p208/p208-p207]: "response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL", re-queuing (5.015949299s)
2024-11-06 16:24:04.604 INFO controller/domain_route.go:432 add cluster p208-to-p207 name:http protocol:HTTP port:11080
2024-11-06 16:24:04.605 INFO xds/cluster_config.go:131 Generate tls config for p208-to-p207-http
2024-11-06 16:24:04.605 INFO xds/xds.go:439 Add cluster:p208-to-p207-http
2024-11-06 16:24:04.605 INFO xds/xds.go:439 Add cluster:p208-to-p207-http
2024-11-06 16:24:04.605 INFO controller/domain_route.go:293 DomainRoute p208/p208-p207 starts handshake, the last revision is 0
2024-11-06 16:24:09.621 ERROR controller/handshake.go:307 DomainRoute p208-p207: handshake fail:response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:24:09.622 ERROR controller/domain_route.go:297 response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:24:09.622 INFO queue/queue.go:109 Re-syncing: queue id[domain-route-queue], retry:[15] key[p208/p208-p207]: "response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL", re-queuing (5.017143791s)
2024-11-06 16:24:52.901 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p207/domaindatagrant-7f02cc39a349fe0968e2368a58fe1dda] (33.888µs)
2024-11-06 16:24:52.901 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p207/69da9f849c0a11efbb5decd68aece6cb] (8.358µs)
2024-11-06 16:24:52.901 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p208/69da9f849c0a11efbb5decd68aece6cb] (88.965µs)
2024-11-06 16:24:52.905 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p208/domaindatagrant-7f02cc39a349fe0968e2368a58fe1dda] (4.117694ms)
2024-11-06 16:25:52.875 WARN domainroute/check.go:138 Domainroute p208/p208-p207 token is waiting more than 2 minutes for ready, so need to re-handshake
2024-11-06 16:25:52.875 WARN domainroute/check.go:46 Domainroute p208/p207-p208 checkEffectiveInstances failed: tokens is nil, please check the result of handshake in instance's log
2024-11-06 16:25:52.875 INFO queue/queue.go:176 Finish processing item: queue id[domain-route-controller], key[p208/p207-p208] (100.901µs)
2024-11-06 16:25:52.883 INFO domainroute/rolling.go:47 PreRollingDomainRoute p208/p208-p207, new revision 0
2024-11-06 16:25:52.883 INFO queue/queue.go:176 Finish processing item: queue id[domain-route-controller], key[p208/p208-p207] (8.136781ms)
2024-11-06 16:25:52.883 INFO queue/queue.go:176 Finish processing item: queue id[domain-route-controller], key[p208/p208-p207] (12.862µs)
2024-11-06 16:25:52.883 INFO controller/domain_route.go:432 add cluster p208-to-p207 name:http protocol:HTTP port:11080
2024-11-06 16:25:52.884 INFO xds/cluster_config.go:131 Generate tls config for p208-to-p207-http
2024-11-06 16:25:52.884 INFO xds/xds.go:439 Add cluster:p208-to-p207-http
2024-11-06 16:25:52.884 INFO xds/xds.go:439 Add cluster:p208-to-p207-http
2024-11-06 16:25:52.884 INFO controller/domain_route.go:293 DomainRoute p208/p208-p207 starts handshake, the last revision is 0
2024-11-06 16:25:52.891 INFO clusterdomainroute/domainroute.go:143 ClusterDomainRoute p208-p207 update status
2024-11-06 16:25:52.891 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p208/p208-p207] (7.761ms)
2024-11-06 16:25:52.899 INFO clusterdomainroute/domainroute.go:143 ClusterDomainRoute p208-p207 update status
2024-11-06 16:25:52.899 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p208-p207] (7.433496ms)
2024-11-06 16:25:52.901 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p207/domaindatagrant-7f02cc39a349fe0968e2368a58fe1dda] (17.468µs)
2024-11-06 16:25:52.901 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p207/69da9f849c0a11efbb5decd68aece6cb] (3.963µs)
2024-11-06 16:25:52.902 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p208/69da9f849c0a11efbb5decd68aece6cb] (91.031µs)
2024-11-06 16:25:52.905 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p208/domaindatagrant-7f02cc39a349fe0968e2368a58fe1dda] (3.841695ms)
2024-11-06 16:25:52.932 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p207-p208] (130.323µs)
2024-11-06 16:25:52.939 INFO clusterdomainroute/domainroute.go:143 ClusterDomainRoute p208-p207 update status
2024-11-06 16:25:52.939 INFO queue/queue.go:176 Finish processing item: queue id[cluster-domain-route-controller], key[p208-p207] (7.025526ms)
2024-11-06 16:25:57.900 ERROR controller/handshake.go:307 DomainRoute p208-p207: handshake fail:response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:25:57.900 ERROR controller/domain_route.go:297 response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:25:57.900 ERROR queue/queue.go:115 Forgetting: queue id[domain-route-queue], key[p208/p208-p207] (5.016400742s), due to maximum retries[16] reached, last error: "response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL"
2024-11-06 16:26:52.902 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p208/69da9f849c0a11efbb5decd68aece6cb] (66.664µs)
2024-11-06 16:26:52.902 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p207/domaindatagrant-7f02cc39a349fe0968e2368a58fe1dda] (18.355µs)
2024-11-06 16:26:52.902 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p207/69da9f849c0a11efbb5decd68aece6cb] (14.036µs)
2024-11-06 16:26:52.907 INFO queue/queue.go:124 Finish processing item: queue id[domaindatagrant_controller], key[p208/domaindatagrant-7f02cc39a349fe0968e2368a58fe1dda] (4.216477ms)
2024-11-06 16:26:53.463 INFO controller/domain_route.go:432 add cluster p208-to-p207 name:http protocol:HTTP port:11080
2024-11-06 16:26:53.463 INFO xds/cluster_config.go:131 Generate tls config for p208-to-p207-http
2024-11-06 16:26:53.463 INFO xds/xds.go:439 Add cluster:p208-to-p207-http
2024-11-06 16:26:53.463 INFO xds/xds.go:439 Add cluster:p208-to-p207-http
2024-11-06 16:26:53.463 INFO controller/domain_route.go:293 DomainRoute p208/p208-p207 starts handshake, the last revision is 0
2024-11-06 16:26:58.479 ERROR controller/handshake.go:307 DomainRoute p208-p207: handshake fail:response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:26:58.479 ERROR controller/domain_route.go:297 response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL
2024-11-06 16:26:58.479 INFO queue/queue.go:109 Re-syncing: queue id[domain-route-queue], retry:[0] key[p208/p208-p207]: "response status code [503], detail -> upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TL", re-queuing (5.016320863s)
2024-11-06 16:26:58.485 INFO controller/domain_route.go:432 add cluster p208-to-p207 name:http protocol:HTTP port:11080
2024-11-06 16:26:58.485 INFO xds/cluster_config.go:131 Generate tls config for p208-to-p207-http
BrainWH commented 2 weeks ago

1、在对应的容器内部执行kubectl get cdr 查看授权信息,看一下返回值 2、执行命令curl -kvvv http://1.1.1.1:18080(此处为示例 ip 与端口),看一下返回的结果。

T-ze-yu commented 2 weeks ago

image image 看来是授权不成功

BrainWH commented 2 weeks ago

可以执行kubectl get cdr -n xxx xxx -o yaml看一下对应的路由的yaml 信息

T-ze-yu commented 2 weeks ago

kubectl get cdr -n xxx xxx -o yaml

kubectl get cdr -n p208 -o yaml apiVersion: v1 items:

BrainWH commented 2 weeks ago

可以贴下创建路由的命令

T-ze-yu commented 2 weeks ago

可以贴下创建路由的命令

参照的https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.9.0b0/deployment/Docker_deployment_kuscia/deploy_p2p_cn#id7 相互颁发证书后在208执行scripts/deploy/add_domain.sh p207 p2p ;207执行scripts/deploy/add_domain.sh p208 p2p 之后在208执行scripts/deploy/join_to_host.sh p208 p207 https://192.168.210.207:11080;207执行:scripts/deploy/join_to_host.sh p207 p208 https://192.168.210.208:11080

T-ze-yu commented 2 weeks ago

可以贴下创建路由的命令

参照的https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.9.0b0/deployment/Docker_deployment_kuscia/deploy_p2p_cn#id7 相互颁发证书后在208执行scripts/deploy/add_domain.sh p207 p2p ;207执行scripts/deploy/add_domain.sh p208 p2p 之后在208执行scripts/deploy/join_to_host.sh p208 p207 https://192.168.210.207:11080;207执行:scripts/deploy/join_to_host.sh p207 p208 https://192.168.210.208:11080

把其中的https换成http好像就可以了

BrainWH commented 2 weeks ago

Protocol 设置为 NOTLS 时,节点间通信使用http

T-ze-yu commented 2 weeks ago

但在执行求交任务还是报错了domain_data = get_domain_data(domaindata_stub, domaindata_id) 2024-11-07T15:45:07.381023627+08:00 stderr F File "/usr/local/lib/python3.10/site-packages/secretflow/kuscia/datamesh.py", line 81, in get_domain_data 2024-11-07T15:45:07.381047985+08:00 stderr F raise RuntimeError(f"get_dist_data failed for {id}: status = {ret.status}") 2024-11-07T15:45:07.381051804+08:00 stderr F RuntimeError: get_dist_data failed for 9dd7d6909cdb11ef9d84ecd68aece6cb: status = code: 12201 2024-11-07T15:45:07.381053666+08:00 stderr F message: "domaindatas.kuscia.secretflow \"9dd7d6909cdb11ef9d84ecd68aece6cb\" not found" 2024-11-07T15:45:07.381055566+08:00 stderr F

T-ze-yu commented 2 weeks ago

但在执行求交任务还是报错了domain_data = get_domain_data(domaindata_stub, domaindata_id) 2024-11-07T15:45:07.381023627+08:00 stderr F File "/usr/local/lib/python3.10/site-packages/secretflow/kuscia/datamesh.py", line 81, in get_domain_data 2024-11-07T15:45:07.381047985+08:00 stderr F raise RuntimeError(f"get_dist_data failed for {id}: status = {ret.status}") 2024-11-07T15:45:07.381051804+08:00 stderr F RuntimeError: get_dist_data failed for 9dd7d6909cdb11ef9d84ecd68aece6cb: status = code: 12201 2024-11-07T15:45:07.381053666+08:00 stderr F message: "domaindatas.kuscia.secretflow "9dd7d6909cdb11ef9d84ecd68aece6cb" not found" 2024-11-07T15:45:07.381055566+08:00 stderr F

通过分析9dd7d6909cdb11ef9d84ecd68aece6cb是合作方是数据,但在合作方已经进行了数据授权是操作: json_body: {'domain_id': 'p208', 'domaindata_id': '9dd7d6909cdb11ef9d84ecd68aece6cb', 'grant_domain': 'p207'} response: {'status': {'code': 0, 'message': 'success', 'details': []}, 'data': {'domaindatagrant_id': 'domaindatagrant-fa80775cccd18aa84ae80645a864014b'}}

T-ze-yu commented 2 weeks ago

还是授权不到位 image

BrainWH commented 2 weeks ago

kubectl get cdr -n xxx xxx -o yaml看一下对应的路由的yaml 信息

T-ze-yu commented 2 weeks ago

我刚才丰富进行了一下授权,现在正常了 image 但数据授权还是存在之前一样的问题

T-ze-yu commented 2 weeks ago

kubectl get cdr -n xxx xxx -o yaml

kubectl get cdr -n p207-p208 -o yaml apiVersion: v1 items:

BrainWH commented 2 weeks ago

看到上面截图中的kubectl get cdr的返回值是true,现在的问题是出在哪里?目前正常了吗?

我刚才丰富进行了一下授权,现在正常了 image 但数据授权还是存在之前一样的问题

T-ze-yu commented 2 weeks ago

但在执行求交任务还是报错了domain_data = get_domain_data(domaindata_stub, domaindata_id) 2024-11-07T15:45:07.381023627+08:00 stderr F File "/usr/local/lib/python3.10/site-packages/secretflow/kuscia/datamesh.py", line 81, in get_domain_data 2024-11-07T15:45:07.381047985+08:00 stderr F raise RuntimeError(f"get_dist_data failed for {id}: status = {ret.status}") 2024-11-07T15:45:07.381051804+08:00 stderr F RuntimeError: get_dist_data failed for 9dd7d6909cdb11ef9d84ecd68aece6cb: status = code: 12201 2024-11-07T15:45:07.381053666+08:00 stderr F message: "domaindatas.kuscia.secretflow "9dd7d6909cdb11ef9d84ecd68aece6cb" not found" 2024-11-07T15:45:07.381055566+08:00 stderr F

通过分析9dd7d6909cdb11ef9d84ecd68aece6cb是合作方是数据,但在合作方已经进行了数据授权是操作: json_body: {'domain_id': 'p208', 'domaindata_id': '9dd7d6909cdb11ef9d84ecd68aece6cb', 'grant_domain': 'p207'} response: {'status': {'code': 0, 'message': 'success', 'details': []}, 'data': {'domaindatagrant_id': 'domaindatagrant-fa80775cccd18aa84ae80645a864014b'}}

还是存在这样的问题,在207拿不到授权的数据

BrainWH commented 2 weeks ago

确保alice和bob的数据都创建了domaindata和domaindatagrant,可以重新执行:https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.12.0b0/deployment/Docker_deployment_kuscia/deploy_p2p_cn#id9

T-ze-yu commented 2 weeks ago

image image

并 不能授权成功

BrainWH commented 2 weeks ago

image image

并 不能授权成功

如果你的Protocol 设置为 NOTLS 时,节点间通信使用http。看到你的授权命令里面用的还是https, 改成http就好了

T-ze-yu commented 2 weeks ago

抱歉!前面命令忘记改了,是用的http,但还是有一方得不到授权 image image

BrainWH commented 2 weeks ago

先把失败方的cdr、domaindata、domaindatagrant全部删除,然后重新授权

T-ze-yu commented 2 weeks ago

先把失败方的cdr、domaindata、domaindatagrant全部删除,然后重新授权

有删除的命令吗?还是需要重新部署

BrainWH commented 2 weeks ago

可以 kubectl delete 删除

T-ze-yu commented 1 week ago

嗯,可以了,感谢