secretflow / kuscia

Kuscia(Kubernetes-based Secure Collaborative InfrA) is a K8s-based privacy-preserving computing task orchestration framework.
https://www.secretflow.org.cn/docs/kuscia/latest/zh-Hans
Apache License 2.0
73 stars 55 forks source link

重装kuscia后,出现授权错误 #408

Closed RotaercAH closed 2 months ago

RotaercAH commented 2 months ago

Issue Type

Install/Deploy

Search for existing issues similar to yours

Yes

OS Platform and Distribution

CentOS Linux release 7.9.2009 (Core)

Kuscia Version

kuscia v0.10.0b0

Deployment

k8s

deployment Version

k8s v1.16.9

App Running type

secretflow

App Running version

secretflow 1.7.0b0

Configuration file used to run kuscia.

# configmap_alice.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: kuscia-autonomy-alice
  namespace: autonomy-alice
data:
  kuscia.yaml: |-
    # 启动模式
    mode: autonomy

    # 节点ID
    # 示例: domainID: alice
    domainID: alice
    # 节点私钥配置, 用于节点间的通信认证(通过 2 方的证书来生成通讯的身份令牌), 节点应用的证书签发(为了加强通讯安全性,kuscia 会给每一个任务引擎分配 MTLS 证书,不论引擎访问其他模块(包括外部),还是其他模块访问引擎,都走 MTLS 通讯,以免内部攻破引擎。)
    # 注意: 目前节点私钥仅支持 pkcs#1 格式的: "BEGIN RSA PRIVATE KEY/END RSA PRIVATE KEY"
    # 执行命令 "docker run -it --rm secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia scripts/deploy/generate_rsa_key.sh" 生成私钥
    domainKeyData: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ3FMazZDQmtSRzBKNXYKc0hBZ0NOMTNWcTVOdWRhRk1qaW9vWHhEUWx3QU1YVGc1VlRZeWZEWEUxN0tLK0Z5d2ZEZFlsdUpJNFkvOWNuawppNTdtWUg1Y2kwZ05iNkFEa3p3Zi9hMXRBSVNCcmxOcUtxZnJaNmY4WHI5Rkx0WFUxb3h2dGhSdUNyTGJTZ0tkCjhJa0hkd2ZieHFYVUphdm0wamlmeFV1REdldXVEMXFBQzkwSG9mTHhWZ1hTc21iT0l0RDhiaXVDSExEWWJYaWoKZzFtVGdSNDU5R1dwcmt6ek8vNVQ3V05icm9FMk4zY3h5OFMyU3lkQkJ5ZkF4a2VxS01mNG41NVE4THpFK2ttVgptdkJZVi96K2FDVEhWRlYzR2RDMklobGxWSWFzSS8rdTY2SGcxMnllbXNNbFFINWZUQ1NtMmVpeDg3UE9UdHJGCnBmT2RaVUhUQWdNQkFBRUNnZ0VBR2YzbmxMbFRUVU9JcDBOWjVMS2w3SmVyR0lqMUtEUEM3cEozY2FoZGQ5UVYKNTFGdmM0cm9RMWtjaGFGTkZpTmozOVFwYWRrb3BIVXNTRUZBM0N2SnNPVys4L3BrQkpmRXU1Z1ptRWZYZFIwRQpkWGNkWFhsZjhVNGhSWFpCUjNnYlMrYVIyVHErRlhzSXlrbVdERE5VV205TkhZbEJaNGdkQ04zdnlnNjM3Y1gvCnlLNit5OC82VTBZamFJYWJubDF6bFBNcnh4NTc4Y0lmdXlGUXVvV1lvbkVTL2dTbkJSU2ZtNkpSV1NHRWVkdm8Kdk05VHBVUVhGVmFxeFBlbEd1Q1hhT0FlRzdoTUxWRFh5RERrU1pwQUpaTDVaQWpKa0VXV3NsQWlPQXJId2x5YQprTzZFa1pzWVN0Zlp2c3FSY1Z3d2hKeGZtR1lvdmZrQkdnTG5JVXJCbFFLQmdRRFJzUmlESElma0xuak9tTGE0CjdrWFV6bldIeWpBS2xEd3EyV2FyRnovSnk5MjA4TnBscWVSRWlyV3UwYnVKYkErZXJLbW9keHk3aDJsU0piaXYKaDZHdVhIUDJoU2NnSWM0R3dNOVJTdUNKWTI2T0tjaDB6cVlsT3UrRWpDbmM2K0FDWXh4RUZFRzM4YXZHZlZ6Nwp4ZlI0c0Nwbk40d3ZzODM0NjFKVkdJV2w5UUtCZ1FEUHczZGk2Q0RFaS80TTNuS0FEbWpCOVlHa2l0Q2hNamxTCmRwNTAwUHFzUk5FZ1IrVC85Z240aEVFdWRhVTZVL2tucFVzRkVxWGpoZ0xiRE1QT2VFcHo1S042WUxSWktuRTUKTTN3b2U5bXh1N1Z4V2ZWQ2Z4RURLNGlhd0hYbUs0VlREUEZiUTVXWDRQMCtQYmRJcFhMOVhRZU93ZnYzY2xMNAowY1Y5elgranB3S0JnUUNUTmNub21jb0k5bHNYWnZ5NFhZYW12SDZrWXR4UlFQbndkd2x0eVhlZHVzS2QrWXpKClhIa0ZhWC9kQ0I2cGZqU0ZCK0JmaGFlbE80NUQvbmxtdVVoWGVVNXIzZFMyNlNTVGR4N1Vpa1dTRGowYUR0bE0KcjVyU2ZrcVNlamdWZ1g2VkRuRlVsZ2dCRStldEJHdVgwY1FzU2ppcWw4T1I1YUFQUlYxYW9rbUpWUUtCZ0VwZwpHZ3dCTjBIRkw4UWhtZkczdHM3QWVaR1MxQTd3c002UmdqWWxYYWR2MTBGc0cxRjZIYVdtaXNMOEFKTTUzbmJQCjJHUlBnYTFLbXhrWm43cjVHd1lUOG1YcjJvUVZDb1ZFcGd6RUVYRnIxZzltK2NLOVJEVFRUOHErWFRaeG0vL1kKSVVyZmpkemFBUzVYMzVZVkRHNGc4SVN0Y3VycE5VUzNxN0JXY1h2L0FvR0FmMm8zSUhzdE80Y3d6Q2gzZzYvSQo3RmVReVB6T2pZNzlyVFJuM0VVZHdXcG9YOEZWczg4UlJDYzkrZkVxWmVQbk9BVHJqcVQvdGUxUlA5TTlJbG1MCmlHRTdvcWxyZ25obHlnWlVnQkpiTGx1VElZZkdsWkh0UVFiTGZUNzM3OFlQWklhRThpb2wrbzVmRDQ5MlFlb1EKWFpxd3o5LzltU3ZORWxKU0ZMdm1wRWs9Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K

    # 日志级别 INFO、DEBUG、WARN
    logLevel: INFO

    # runc or runk
    runtime: runk

    runk:
      # 任务调度到指定的机构 K8s namespace 下
      namespace: autonomy-alice
      # K8s 集群的 pod dns 配置, 用于解析节点的应用域名, runk 拉起的 pod 所使用 dns 地址,应配置为 kuscia-autonomy service 的 clusterIP, 此处以 "1.1.1.1" 为例
      dnsServers:
      # - kuscia-dns-lb-server
        - 10.99.167.225
      #  K8s 集群的 kubeconfig, 不填默认 serviceaccount; 当前请不填,默认使用 serviceaccount
      kubeconfigFile:

    # 节点的可调度容量, runc 不填会自动获取当前容器的系统资源, runk 模式下需要手动配置
    capacity:
      cpu: 4
      memory: 4Gi
      storage: 100Gi

    # agent 镜像配置, 使用私有仓库存储镜像时配置(默认无需配置)
    image:
      pullPolicy: #使用镜像仓库|使用本地
      defaultRegistry: ""
      registries:
        - name: ""
          endpoint: ""
          username: ""
          password: ""

    ####### master节点配置 #########
    # 数据库连接串,不填默认使用 sqlite (dsnXXXX) dns://
    datastoreEndpoint: "mysql://kuscia:Kuscia.2024@tcp(10.2.1.4:3306)/kuscia_alice"
    # KusciaAPI 以及节点对外网关使用的通信协议, NOTLS/TLS/MTLS
    protocol: NOTLS

# configmap_bob.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: kuscia-autonomy-bob
  namespace: autonomy-bob
data:
  kuscia.yaml: |-
    # 启动模式
    mode: autonomy

    # 节点ID
    # 示例: domainID: alice
    domainID: bob
    # 节点私钥配置, 用于节点间的通信认证(通过 2 方的证书来生成通讯的身份令牌), 节点应用的证书签发(为了加强通讯安全性,kuscia 会给每一个任务引擎分配 MTLS 证书,不论引擎访问其他模块(包括外部),还是其他模块访问引擎,都走 MTLS 通讯,以免内部攻破引擎。)
    # 注意: 目前节点私钥仅支持 pkcs#1 格式的: "BEGIN RSA PRIVATE KEY/END RSA PRIVATE KEY"
    # 执行命令 "docker run -it --rm secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia scripts/deploy/generate_rsa_key.sh" 生成私钥
    domainKeyData: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ0M1bnJqbUZ2VnRPc2EKQ1VPQnE2RUtISGs4STFiajFhMTJHUGk3ZHVwczA4YWdRb1dpNGV0bDg2Y3EwYUpYRlBPclVMVVhGb2NlMUJzSgpLR3l3OG5wcGtEMk4xVTdzTHVrc0RQakhiRDZoWDR0NmhWcEVOMFhDbTVNM1pjTlRBRUkvdmdqK3hvb25uYkJuCkRwZ1hNR2c1dVVmVGtHMjZYWHRZMjl1TUFkNmZiQzBXelh0UUI2SVQvOTFJSEdLY0hjSnR4dlJuUE1vbUI1V3UKT05IcXg3K2Z5cm5SZmc4elFiRm9sZmp6Q2VGVHI4RFhJSitVSEZCY1NFdDN2TUJxa0FvbmdOeHF0UnYvcW9EZApOVllsMDhKdDdDd2ZjTWNvTk1mbFRXWlhPKzB4QTZ6OU9LVlM2SnV5VEZJYUlEZnpFN1ZYOFpKVHJ2amkzU1V6Ck0vbjduWXNMQWdNQkFBRUNnZ0VBQVdJZFVRd09HWVp3S3pGdXhiem11eEFIRDZpL1J0QXI3bjNwSEU4c1hZaEgKZVB3M0tqWHRmREkvK1NYN2NZeEhXTGhxZnRCcWlNbDEwd01Sd0g0c3p0Y0IwaTN5WURQWkZhaUpWcXE1Yk9YWQptWmx5V0ZmZEZwZ2RkOGwzNmQwZ1NlZUlOdkg1RDlIZkxQQmgvb25obFFkV2QxUXVMS211MldpRGRicDJxYXV5CmpzSDFKZlpuWWwwV0NpL2pSRUx3aWRxajgzZys4SUdxWW4xMmhqVWRaNzYydzFuenM3ditrVm9zcGxZQXB3TkoKekEvNlNzWEpsYklXT0F4amhseWd4K2pkemFYVzJ6WFlnM0VpYTc3QVgweHVUUU9kRkIvTUpEeHlwblRKZTRSWQpLTFVRR3JiK0huN0ViWnNmUG5BNFpraVliazNzSlNKb3haeEl5Tk9Nc1FLQmdRQzRndURBMSt3WUtlZE9keWlNClJUNzFZSGpDU082UDBTejViQVVXMU14b2twNGtWU0VGWGd5dGRDc2x0MitCaW01N0lKRWNJZlRhRk9VRkszUVIKd2VuWFF5MEJxVFVzbDlXcUw5U05XVkRDNG9IUnZIeXdwcjhiY2o1ZEtuc1hoMUdiN3NDVHM5OXdNejUvZWtLaQpmeWg1UFE4WDI1OWN6akVybkpjc2NvZTUwUUtCZ1FDMW5oZmZFVTdlL2xCOWFZRDJ0MjVhRDhQbmlCb3F6MGlZClQ1bllldnhOUExXVWhScE1BRzF2Q3VsZUo5bUVwZ3UzaW96RyswOE10d1VyTVhRTGhqVGJjbkJJWjQrQXdhSm4KZi85UEprRlp6d0xKdTZneVBRUENMUXduNW1iMlJmYUV6dVV2WThrZFczZ0N6aFJoOWc2KzhTUjh1Vks2aFVmSwpYeG1yTzBoU0d3S0JnQ0VFeXRPMzBEaEN4M0h6UVA1Wkpmc2pXSGpzTkVUb1dmUUlzS0IxVkY4aVhjcUNzWFlVCmJwQmJ5WnptUnI0WDE4MlE5bWJpYkw3YUhtSGVkTmI0ckxBcEJWVFd3djFIN3FTV0NxT0E2RUwzNWVOeXA1MjEKT1YzZ0Era0lRUjdreUdYdlErY3F1VUdLNmhSRi9NYTNtcmFYaHF2dVVZWjZIN0orUTA5Zzc0a0JBb0dBWG1tcAo5UzlWTmYwMHNJMXBHbGh2Q0dpTHFkQUo4bGxCWHRSNm9Kd0dqc3hSaEx6UTE5T2RFQTIzRlZoWDdtbzNTeG0rClp5NTdnSnVnRnowbEcxeVFHOGhZOEhyTmtkeVhaWUNYbzNpNm5rcE1JN3puQ2Y3SDltaGVtbHRmQ1FXRHlyU1gKVmRSazExc1dmemJNUjhTWEU5SGQ2dXlZUWhoSklyM2ZaVEZ6UGlNQ2dZRUFvVVl1L1d0Z2F1WDNYQ2tXUXA2VQpwTVNPWHhMQzF3NE1iVXZseDhrcEdjcktScWdLNXdCK2dGZSthZFM0dnd5aFZ5U2RoWDR6cjFTZmh1M2w2dkVjCm56eU5hZVJlWjA4V2cxalQ3WGZPUXRWcTBOYkJONHV0YnFEbzJLWjhJVU9icFFVVFpvd2R2aGFVYWg4R0dUSk4KclRUSlhxY0thSGNYZDRTcm9LT1RHeEE9Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K

    # 日志级别 INFO、DEBUG、WARN
    logLevel: INFO

    # runc or runk
    runtime: runk

    runk:
      # 任务调度到指定的机构 K8s namespace 下
      namespace: autonomy-bob
      # K8s 集群的 pod dns 配置, 用于解析节点的应用域名, runk 拉起的 pod 所使用 dns 地址,应配置为 kuscia-autonomy service 的 clusterIP, 此处以 "1.1.1.1" 为例
      dnsServers:
      # - kuscia-dns-lb-server
        - 10.111.133.183
      #  K8s 集群的 kubeconfig, 不填默认 serviceaccount; 当前请不填,默认使用 serviceaccount
      kubeconfigFile:

    # 节点的可调度容量, runc 不填会自动获取当前容器的系统资源, runk 模式下需要手动配置
    capacity:
      cpu: 4
      memory: 4Gi
      storage: 100Gi

    # agent 镜像配置, 使用私有仓库存储镜像时配置(默认无需配置)
    image:
      pullPolicy: #使用镜像仓库|使用本地
      defaultRegistry: ""
      registries:
        - name: ""
          endpoint: ""
          username: ""
          password: ""

    ####### master节点配置 #########
    # 数据库连接串,不填默认使用 sqlite (dsnXXXX) dns://
    datastoreEndpoint: "mysql://kuscia:Kuscia.2024@tcp(10.2.1.4:3306)/kuscia_bob"
    # KusciaAPI 以及节点对外网关使用的通信协议, NOTLS/TLS/MTLS
    protocol: NOTLS

What happend and What you expected to happen.

测试作业任务运行成功后,希望将KusciaAPI 以及节点对外网关使用的通信协议修改为TLS,于是修改configmap.yaml配置,重启拉起pod后,重新完成授权步骤,但出现了授权错误

Kuscia log output.

#使用 curl -kvvv 命令测试联通性
(base) [root@tg-tee-1 kuscia]# curl -kvvv http://10.2.1.4:30201
* About to connect() to 10.2.1.4 port 30201 (#0)
*   Trying 10.2.1.4...
* Connected to 10.2.1.4 (10.2.1.4) port 30201 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.2.1.4:30201
> Accept: */*
>
< HTTP/1.1 401 Unauthorized
< x-accel-buffering: no
< content-length: 13
< content-type: text/plain
< kuscia-error-message: <alice/kuscia-autonomy-alice-c798bfb4-vtf7v/external $$ Unauthorized>
< date: Wed, 28 Aug 2024 08:22:24 GMT
< server: kuscia-gateway
<
* Connection #0 to host 10.2.1.4 left intact
unauthorized.(base) [root@tg-tee-1 kuscia]# curl -kvvv http://10.2.1.4:30211
* About to connect() to 10.2.1.4 port 30211 (#0)
*   Trying 10.2.1.4...
* Connected to 10.2.1.4 (10.2.1.4) port 30211 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.2.1.4:30211
> Accept: */*
>
< HTTP/1.1 401 Unauthorized
< x-accel-buffering: no
< content-length: 13
< content-type: text/plain
< kuscia-error-message: <bob/kuscia-autonomy-bob-bcc6d5b9b-frxgk/external $$ Unauthorized>
< date: Wed, 28 Aug 2024 08:22:28 GMT
< server: kuscia-gateway
<
* Connection #0 to host 10.2.1.4 left intact

#kubectl get cdr 命令查看授权信息
[root@kuscia-autonomy-alice-c798bfb4-vtf7v kuscia]# kubectl get cdr
NAME        SOURCE   DESTINATION   HOST       AUTHENTICATION   READY
alice-bob   alice    bob           10.2.1.4   Token            False
#kubectl get cdr 错误日志
[root@kuscia-autonomy-alice-c798bfb4-vtf7v kuscia]# kubectl get cdr alice-bob -o yaml
apiVersion: kuscia.secretflow/v1alpha1
kind: ClusterDomainRoute
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kuscia.secretflow/v1alpha1","kind":"ClusterDomainRoute","metadata":{"annotations":{},"name":"alice-bob"},"spec":{"authenticationType":"Token","destination":"bob","endpoint":{"host":"10.2.1.4","ports":[{"isTLS":false,"name":"http","pathPrefix":"/","port":30211,"protocol":"HTTP"}]},"interConnProtocol":"kuscia","requestHeadersToAdd":{"Authorization":"Bearer"},"source":"alice","tokenConfig":{"rollingUpdatePeriod":86400,"tokenGenMethod":"RSA-GEN"}}}
  creationTimestamp: "2024-08-28T08:18:32Z"
  generation: 4
  labels:
    kuscia.secertflow/domainroute-partner: bob
    kuscia.secretflow/clusterdomainroute-destination: bob
    kuscia.secretflow/clusterdomainroute-source: alice
  name: alice-bob
  resourceVersion: "244748"
  uid: 5dc73e07-1e11-4e05-981b-289aedd92764
spec:
  authenticationType: Token
  destination: bob
  endpoint:
    host: 10.2.1.4
    ports:
    - name: http
      pathPrefix: /
      port: 30211
      protocol: HTTP
  interConnProtocol: kuscia
  requestHeadersToAdd:
    Authorization: Bearer
  source: alice
  tokenConfig:
    destinationPublicKey: LS0tLS1CRUdJTiBSU0EgUFVCTElDIEtFWS0tLS0tCk1JSUJDZ0tDQVFFQWd1WjY0NWhiMWJUckdnbERnYXVoQ2h4NVBDTlc0OVd0ZGhqNHUzYnFiTlBHb0VLRm91SHIKWmZPbkt0R2lWeFR6cTFDMUZ4YUhIdFFiQ1Noc3NQSjZhWkE5amRWTzdDN3BMQXo0eDJ3K29WK0xlb1ZhUkRkRgp3cHVUTjJYRFV3QkNQNzRJL3NhS0o1MndadzZZRnpCb09ibEgwNUJ0dWwxN1dOdmJqQUhlbjJ3dEZzMTdVQWVpCkUvL2RTQnhpbkIzQ2JjYjBaenpLSmdlVnJqalI2c2UvbjhxNTBYNFBNMEd4YUpYNDh3bmhVNi9BMXlDZmxCeFEKWEVoTGQ3ekFhcEFLSjREY2FyVWIvNnFBM1RWV0pkUENiZXdzSDNESEtEVEg1VTFtVnp2dE1RT3MvVGlsVXVpYgpza3hTR2lBMzh4TzFWL0dTVTY3NDR0MGxNelA1KzUyTEN3SURBUUFCCi0tLS0tRU5EIFJTQSBQVUJMSUMgS0VZLS0tLS0K
    rollingUpdatePeriod: 86400
    sourcePublicKey: LS0tLS1CRUdJTiBSU0EgUFVCTElDIEtFWS0tLS0tCk1JSUJDZ0tDQVFFQXFpNU9nZ1pFUnRDZWI3QndJQWpkZDFhdVRibldoVEk0cUtGOFEwSmNBREYwNE9WVTJNbncKMXhOZXlpdmhjc0h3M1dKYmlTT0dQL1hKNUl1ZTVtQitYSXRJRFcrZ0E1TThILzJ0YlFDRWdhNVRhaXFuNjJlbgovRjYvUlM3VjFOYU1iN1lVYmdxeTIwb0NuZkNKQjNjSDI4YWwxQ1dyNXRJNG44VkxneG5ycmc5YWdBdmRCNkh5CjhWWUYwckptemlMUS9HNHJnaHl3MkcxNG80TlprNEVlT2ZSbHFhNU04enYrVSsxalc2NkJOamQzTWN2RXRrc24KUVFjbndNWkhxaWpIK0orZVVQQzh4UHBKbFpyd1dGZjgvbWdreDFSVmR4blF0aUlaWlZTR3JDUC9ydXVoNE5kcwpucHJESlVCK1gwd2twdG5vc2ZPenprN2F4YVh6bldWQjB3SURBUUFCCi0tLS0tRU5EIFJTQSBQVUJMSUMgS0VZLS0tLS0K
    tokenGenMethod: RSA-GEN
status:
  conditions:
  - lastTransitionTime: "2024-08-28T08:22:07Z"
    lastUpdateTime: "2024-08-28T08:22:07Z"
    message: TokenNotGenerate
    reason: DestinationIsNotAuthrized
    status: "False"
    type: Ready
  endpointStatuses:
    kuscia-autonomy-alice-c798bfb4-vtf7v-http:
      endpointHealthy: true
  tokenStatus: {}
RotaercAH commented 2 months ago

重置数据库后再重新安装该问题得到了解决,是否是因为数据库中的历史数据对重新授权造成了影响。如果修改了configmap中的配置例如将NOTLS修改为TLS,重新拉起Pod使configmap配置生效后,应该执行那些步骤来进行授权呢,我重新执行了全部授权步骤,于是出现了授权失败的错误。

zimu-yuxi commented 2 months ago

重新授权前是否删除了之前的路由 可以参考这里

RotaercAH commented 2 months ago

重新授权前是否删除了之前的路由 可以参考这里

这是很有效的解决方法,thanks