Closed magic-hya closed 4 weeks ago
您好,可以先尝试下在本地直接拉取镜像是否能够成功,另外进入kuscia pod节点中提供下sf的Appimage配置
kubectl get Appimage
kubectl get Appimage xxx -oyaml
宿主机拉取镜像
$ docker pull harbor.com/secretflow/secretflow-lite-anolis8:1.7.0b0
1.7.0b0: Pulling from secretflow/secretflow-lite-anolis8
Digest: sha256:9c2ea53baf6f252d31cc7fc46cbd878b85321d3edc1009637ac96a37088fd8a2
Status: Image is up to date for harbor.com/secretflow/secretflow-lite-anolis8:1.7.0b0
harbor.com/secretflow/secretflow-lite-anolis8:1.7.0b0
进入kuscia master节点
$ kubectl exec -it kuscia-master-55bffb8764-l7nn5 -n kuscia -- bash
$ kubectl get Appimage
NAME AGE
secretflow-image 3d22h
appimage信息
$ kubectl get Appimage secretflow-image -oyaml
apiVersion: kuscia.secretflow/v1alpha1
kind: AppImage
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"kuscia.secretflow/v1alpha1","kind":"AppImage","metadata":{"annotations":{},"name":"secretflow-image"},"spec":{"configTemplates":{"task-config.conf":"{\n \"task_id\": \"{{.TASK_ID}}\",\n \"task_input_config\": \"{{.TASK_INPUT_CONFIG}}\",\n \"task_cluster_def\": \"{{.TASK_CLUSTER_DEFINE}}\",\n \"allocated_ports\": \"{{.ALLOCATED_PORTS}}\"\n}\n"},"deployTemplates":[{"name":"secretflow","replicas":1,"spec":{"containers":[{"args":["-c","python -m secretflow.kuscia.entry ./kuscia/task-config.conf"],"command":["sh"],"configVolumeMounts":[{"mountPath":"/root/kuscia/task-config.conf","subPath":"task-config.conf"}],"name":"secretflow","ports":[{"name":"spu","port":20000,"protocol":"GRPC","scope":"Cluster"},{"name":"fed","port":20001,"protocol":"GRPC","scope":"Cluster"},{"name":"global","port":20002,"protocol":"GRPC","scope":"Domain"},{"name":"node-manager","port":20003,"protocol":"GRPC","scope":"Local"},{"name":"object-manager","port":20004,"protocol":"GRPC","scope":"Local"},{"name":"client-server","port":20005,"protocol":"GRPC","scope":"Local"}],"workingDir":"/root"}],"restartPolicy":"Never"}}],"image":{"id":"abc","name":"harbor.com/secretflow/secretflow-lite-anolis8","sign":"abc","tag":"1.7.0b0"}}}
creationTimestamp: "2024-08-30T03:21:20Z"
generation: 1
name: secretflow-image
resourceVersion: "229735"
uid: 690b08d6-2064-4feb-8107-94005cbfe166
spec:
configTemplates:
task-config.conf: |
{
"task_id": "{{.TASK_ID}}",
"task_input_config": "{{.TASK_INPUT_CONFIG}}",
"task_cluster_def": "{{.TASK_CLUSTER_DEFINE}}",
"allocated_ports": "{{.ALLOCATED_PORTS}}"
}
deployTemplates:
- name: secretflow
replicas: 1
spec:
containers:
- args:
- -c
- python -m secretflow.kuscia.entry ./kuscia/task-config.conf
command:
- sh
configVolumeMounts:
- mountPath: /root/kuscia/task-config.conf
subPath: task-config.conf
name: secretflow
ports:
- name: spu
port: 20000
protocol: GRPC
scope: Cluster
- name: fed
port: 20001
protocol: GRPC
scope: Cluster
- name: global
port: 20002
protocol: GRPC
scope: Domain
- name: node-manager
port: 20003
protocol: GRPC
scope: Local
- name: object-manager
port: 20004
protocol: GRPC
scope: Local
- name: client-server
port: 20005
protocol: GRPC
scope: Local
workingDir: /root
restartPolicy: Never
image:
id: abc
name: harbor.com/secretflow/secretflow-lite-anolis8
sign: abc
tag: 1.7.0b0
pullPolicy: remote 这个配置去掉。重启下容器再次尝试下
删除配置后
# agent 镜像配置
image:
defaultRegistry: "harbor"
registries:
- name: "harbor"
endpoint: "harbor.com/secretflow"
username: "admin"
password: "Harbor12345"
重新应用配置
kubectl apply -f configmap_lite_alice.yaml
kubectl apply -f configmap_lite_bob.yaml
删除原有pod
kubectl delete pod kuscia-lite-alice-7ffc99c87d-6d96b -n kuscia
kubectl delete pod kuscia-lite-bob-78c7d58487-5gkxj -n kuscia
发起任务后仍然报错
$ kubectl logs secretflow-task-20240905164634-single-psi-0 -n alice
Error from server: Get "https://192.168.30.173:10250/containerLogs/alice/secretflow-task-20240905164634-single-psi-0/secretflow": proxy error from 0.0.0.0:6443 while dialing 192.168.30.173:10250, code 502: 502 Bad Gateway
在kuscia容器内尝试下kuscia image pull harbor.com/secretflow/secretflow-lite-anolis8 --creds admin: Harbor12345
$ kuscia image pull harbor.com/secretflow/secretflow-lite-anolis8 --creds admin: Harbor12345
Error: unknown flag: --creds
unknown flag: --creds
看来是命令错误
$ kuscia image pull harbor.com/secretflow/secretflow-lite-anolis8 --creds admin: Harbor12345 Error: unknown flag: --creds unknown flag: --creds
看来是命令错误
可能格式有问题,参考这个kuscia image pull --creds username:password image:tag
命令好像没有--creds参数
$ kuscia image pull --creds admin:Harbor12345 harbor.com/secretflow/kuscia-secretflow:v1
Error: unknown flag: --creds
unknown flag: --creds
$ kuscia image pull --help
Manage images
Usage:
kuscia image [command]
Available Commands:
builtin Load a built-in image
load Load an image from a tar archive or STDIN
Flags:
-h, --help help for image
--store string kuscia image storage directory (default "/root/.kuscia/var/images")
Use "kuscia image [command] --help" for more information about a command.
kuscia -v,看下kuscia版本号
使用的是runp部署方式
secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia-secretflow
$ kuscia -v
kuscia version 6994ca0
请使用正确的 kuscia 0.11.0b0 的版本 secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia:0.11.0b0
我使用的是RunP模式,官方给出的镜像是这个
下文将以物理机和 K8s 两种部署环境为例,来介绍基于 RunP 的部署流程。
在物理机上部署
完整的详细流程请参考 [多机部署中心化集群](https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.11.0b0/deployment/Docker_deployment_kuscia/deploy_master_lite_cn) 和 [多机部署点对点集群](https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.11.0b0/deployment/Docker_deployment_kuscia/deploy_p2p_cn)。
其中,使用 RunP 部署的不同点是:
使用 kuscia-secretflow 镜像。
export KUSCIA_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia-secretflow
Stale issue message. Please comment to remove stale tag. Otherwise this issue will be closed soon.
Issue Type
Running
Search for existing issues similar to yours
Yes
OS Platform and Distribution
CentOS Linux 7
Kuscia Version
k8s中心化部署kuscia v0.11.0b0
Deployment
k8s
deployment Version
k8s 1.22.2
App Running type
secretflow
App Running version
secretflow/secretflow-lite-anolis8:1.7.0b0
Configuration file used to run kuscia.
Kuscia log output.
pod显示镜像出错
log显示访问出错
describe显示镜像不存在本地仓库中,远程拉取模式没生效