secretflow / kuscia

Kuscia(Kubernetes-based Secure Collaborative InfrA) is a K8s-based privacy-preserving computing task orchestration framework.
https://www.secretflow.org.cn/docs/kuscia/latest/zh-Hans
Apache License 2.0
73 stars 53 forks source link

中心化部署v0.11.0b0注册自定义算法镜像成功,自定义镜像在lite节点内已存在,但执行任务时,报找不到镜像 #415

Closed Tiger007x closed 2 months ago

Tiger007x commented 2 months ago

Issue Type

Running

Search for existing issues similar to yours

Yes

OS Platform and Distribution

Ubuntu22.04

Kuscia Version

0.11.0b0

Deployment

docker

deployment Version

docker 27.1.2

App Running type

secretflow

App Running version

secretflow 1.8.0b0

Configuration file used to run kuscia.

kubectl get appimage xuyh-image -o yaml:

apiVersion: kuscia.secretflow/v1alpha1
kind: AppImage
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kuscia.secretflow/v1alpha1","kind":"AppImage","metadata":{"annotations":{},"name":"xuyh-image"},"spec":{"configTemplates":{"task-config.conf":"{\n  \"task_id\": \"{{.TASK_ID}}\",\n  \"task_input_config\": \"{{.TASK_INPUT_CONFIG}}\",\n  \"task_cluster_def\": \"{{.TASK_CLUSTER_DEFINE}}\",\n  \"allocated_ports\": \"{{.ALLOCATED_PORTS}}\"\n}\n"},"deployTemplates":[{"name":"secretflow","replicas":1,"spec":{"containers":[{"args":["-c","python -m secretflow.kuscia.entry ./kuscia/task-config.conf"],"command":["sh"],"configVolumeMounts":[{"mountPath":"./kuscia/task-config.conf","subPath":"task-config.conf"}],"name":"secretflow","ports":[{"name":"spu","port":20000,"protocol":"GRPC","scope":"Cluster"},{"name":"fed","port":20001,"protocol":"GRPC","scope":"Cluster"},{"name":"global","port":20002,"protocol":"GRPC","scope":"Domain"},{"name":"node-manager","port":20003,"protocol":"GRPC","scope":"Local"},{"name":"object-manager","port":20004,"protocol":"GRPC","scope":"Local"},{"name":"client-server","port":20005,"protocol":"GRPC","scope":"Local"}],"workingDir":"/root"}],"restartPolicy":"Never"}}],"image":{"name":"secretflow/sf-dev-anolis8","tag":"pir_image0805"}}}
  creationTimestamp: "2024-08-31T15:05:24Z"
  generation: 1
  name: xuyh-image
  resourceVersion: "1082344"
  uid: 4b451220-0165-462e-a218-18ba0019c598
spec:
  configTemplates:
    task-config.conf: |
      {
        "task_id": "{{.TASK_ID}}",
        "task_input_config": "{{.TASK_INPUT_CONFIG}}",
        "task_cluster_def": "{{.TASK_CLUSTER_DEFINE}}",
        "allocated_ports": "{{.ALLOCATED_PORTS}}"
      }
  deployTemplates:
  - name: secretflow
    replicas: 1
    spec:
      containers:
      - args:
        - -c
        - python -m secretflow.kuscia.entry ./kuscia/task-config.conf
        command:
        - sh
        configVolumeMounts:
        - mountPath: ./kuscia/task-config.conf
          subPath: task-config.conf
        name: secretflow
        ports:
        - name: spu
          port: 20000
          protocol: GRPC
          scope: Cluster
        - name: fed
          port: 20001
          protocol: GRPC
          scope: Cluster
        - name: global
          port: 20002
          protocol: GRPC
          scope: Domain
        - name: node-manager
          port: 20003
          protocol: GRPC
          scope: Local
        - name: object-manager
          port: 20004
          protocol: GRPC
          scope: Local
        - name: client-server
          port: 20005
          protocol: GRPC
          scope: Local
        workingDir: /root
      restartPolicy: Never
  image:
    name: secretflow/sf-dev-anolis8
    tag: pir_image0805

What happend and What you expected to happen.

已注册自定义算法镜像xuyh-image,且打包的tar镜像secretflow/sf-dev-anolis8:pir_image0805已导load到宿主机docker,且通道脚本--import已导入到lite节点,再次执行脚本也提示已存在。但执行任务时提示算法镜像找不到需要去官网下载

Kuscia log output.

{
                                    "domain_id": "mt-e5idfqmk83",
                                    "state": "Pending",
                                    "err_msg": "container[secretflow] waiting state reason: \"ErrImagePull\", message: \"faile to pull image \\\"secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/sf-dev-anolis8:pir_image0805\\\", detail-> rpc error: code = Unknown desc = failed to pull and unpack image \\\"secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/sf-dev-anolis8:pir_image0805\\\": failed to resolve reference \\\"secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/sf-dev-anolis8:pir_image0805\\\": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed\";",
                                    "endpoints": [
                                        {
                                            "port_name": "fed",
                                            "scope": "Cluster",
                                            "endpoint": "pirnew-006-partner-0-fed.mt-e5idfqmk83.svc"
                                        },
                                        {
                                            "port_name": "global",
                                            "scope": "Domain",
                                            "endpoint": "pirnew-006-partner-0-global.mt-e5idfqmk83.svc:32371"
                                        },
                                        {
                                            "port_name": "spu",
                                            "scope": "Cluster",
                                            "endpoint": "pirnew-006-partner-0-spu.mt-e5idfqmk83.svc"
                                        }
                                    ]
                                }
Tiger007x commented 2 months ago

微信截图_20240902112529 微信截图_20240902112816 微信截图_20240902113052 补充的截图

wangzul commented 2 months ago

注册sf到kuscia时 image名称要这么写 docker.io/ image-name : image-tag 看样子你导入时配置的是secretflow/sf-dev-anolis8:pir_image0805 导入后默认使用阿里云的仓库所以最终镜像地址为
=阿里镜像地址/secretflow/sf-dev-anolis8:pir_image0805

Tiger007x commented 2 months ago

ok.加上docker.io/ 重新注册可以了。不过之前0.7版本的时候不用加。