secretflow / kuscia

Kuscia(Kubernetes-based Secure Collaborative InfrA) is a K8s-based privacy-preserving computing task orchestration framework.
https://www.secretflow.org.cn/docs/kuscia/latest/zh-Hans
Apache License 2.0
73 stars 55 forks source link

是否是基于自行构建的kuscia镜像启动的节点? #432

Open guanglaiguo opened 1 month ago

guanglaiguo commented 1 month ago

Issue Type

Others

Search for existing issues similar to yours

Yes

Kuscia Version

kuscia 0.10.0b0

Link to Relevant Documentation

https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.11.0b0/deployment/Docker_deployment_kuscia/deploy_p2p_cn#alice

Question Details

基于官网提供的自行构建kuscia镜像方法(https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.11.0b0/development/build_kuscia_cn)通过make image构建了kuscia镜像(secretflow/kuscia:v0.9.0.dev240508-13-g68280d4-20240919180037),然后基于此镜像,部署alice节点(https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.11.0b0/deployment/Docker_deployment_kuscia/deploy_p2p_cn)验证是否镜像构建成功,依次执行如下命令:
export KUSCIA_IMAGE=secretflow/kuscia:v0.9.0.dev240508-13-g68280d4-20240919180037
export SECRETFLOW_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/secretflow-lite-anolis8:1.7.0b0
docker run --rm $KUSCIA_IMAGE cat /home/kuscia/scripts/deploy/kuscia.sh > kuscia.sh && chmod u+x kuscia.sh
docker run -it --rm ${KUSCIA_IMAGE} kuscia init --mode autonomy --domain "alice" > autonomy_alice.yaml 2>&1 || autonomy_alice.yaml
./kuscia.sh start -c autonomy_alice.yaml -p 11080 -k 11081
后,打印如下log:
KUSCIA_IMAGE=secretflow/kuscia:v0.9.0.dev240508-13-g68280d4-20240919180037
SECRETFLOW_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/secretflow-lite-anolis8:1.7.0b0
DATAPROXY_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/dataproxy:0.1.0b1
ROOT=/
DOMAIN_ID=alice
DOMAIN_WORK_DIR=//root-kuscia-autonomy-alice
DOMAIN_LOG_DIR=//root-kuscia-autonomy-alice/logs
DOMAIN_DATA_DIR=//root-kuscia-autonomy-alice/data
DOMAIN_K3S_DB_DIR=//root-kuscia-autonomy-alice/k3s
DOMAIN_HOST_PORT=11080
DOMAIN_HOST_INTERNAL_PORT=13081
KUSCIAAPI_HTTP_PORT=11081
KUSCIAAPI_GRPC_PORT=13083
METRICS_PORT=13084
Starting container root-kuscia-autonomy-alice ...
k3s data already exists //root-kuscia-autonomy-alice/k3s...
Whether to retain k3s data?(y/n): y
root-kuscia-autonomy-alice-containerd
domain_hostname=root-kuscia-autonomy-alice-localhost-localdomain
network=kuscia-exchange
2dba37c5d7736bf2b9e36c6ceee88c2653ef49ba7ca559d657730dd8bbf4e6e4
Probe datamesh successfully
Image 'secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/secretflow-lite-anolis8:1.7.0b0' already exists in container root-kuscia-autonomy-alice
appimage.kuscia.secretflow/secretflow-image unchanged
appimage.kuscia.secretflow/secretflow-nsjail-image unchanged
Create secretflow app image done
Found the engine image 'secretflow/kuscia:v0.9.0.dev240508-13-g68280d4-20240919180037' on host
Start importing image 'secretflow/kuscia:v0.9.0.dev240508-13-g68280d4-20240919180037' Please be patient...
error: secretflow/kuscia:v0.9.0.dev240508-13-g68280d4-20240919180037 import failed
appimage.kuscia.secretflow/diagnose-image configured
Create diagnose app image done
autonomy domain 'alice' deployed successfully

从上面看,有导入kuscia镜像失败的log,但Alice节点似乎又成功启动了??不确定Alice是否是在自建的kuscia镜像上启动的呢??
zimu-yuxi commented 1 month ago

1.docker ps看下容器是否正常启动,如果正常启动进kuscia容器内kuscia -v可以看下版本号 2.如果没有正常启动,可以尝试先执行uninstall.sh脚本卸载,然后sh -x kuscia.sh start -c autonomy_alice.yaml -p 11080 -k 11081

guanglaiguo commented 1 month ago

@zimu-yuxi 容器能启动,kuscia版本号如下: [root@localhost /]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2dba37c5d773 secretflow/kuscia:v0.9.0.dev240508-13-g68280d4-20240919180037 "tini -- bin/kuscia …" 3 hours ago Up 4 minutes 0.0.0.0:13081->80/tcp, :::13081->80/tcp, 0.0.0.0:11080->1080/tcp, :::11080->1080/tcp, 0.0.0.0:11081->8082/tcp, :::11081->8082/tcp, 0.0.0.0:13083->8083/tcp, :::13083->8083/tcp, 0.0.0.0:13084->9091/tcp, :::13084->9091/tcp root-kuscia-autonomy-alice 3a8024334b4a moby/buildkit:buildx-stable-1 "buildkitd --allow-i…" 21 hours ago Up 4 minutes buildx_buildkit_kuscia0 [root@localhost /]# docker exec -it 2dba37c5d773 /bin/bash bash-5.2# kuscia -v kuscia version v0.9.0.dev240508-13-g68280d4

zimu-yuxi commented 1 month ago

看版本号就是你自己打的镜像版本,这个报错可能没有什么影响,想要了解下是基于哪个分支的源码打包的镜像。 另外,可以尝试部署另外一个节点,是否会出现相同问题,然后建立路由尝试进行一个任务 有问题可以继续反馈,我们会持续关注

guanglaiguo commented 1 month ago

@zimu-yuxi 我是在这里下载的源码 ![Uploading Snipaste_2024-09-25_14-43-45.png…](Snipaste_2024-09-25_14-43-45

zimu-yuxi commented 1 month ago

感谢!您可以尝试部署另一个节点,然后进行任务看下是否有问题。建议不用bob来命名,可以自定义其它名称试下,如下: docker run -it --rm ${KUSCIA_IMAGE} kuscia init --mode autonomy --domain "bob-test" > autonomy_bob-test.yaml 2>&1 || autonomy_bob-test.yaml

guanglaiguo commented 1 month ago

@zimu-yuxi 感谢感谢,有问题再跟您请教