secretflow / secretpad

SecretPad is a privacy-preserving computing web platform based on the Kuscia framework, designed to provide easy access to privacy-preserving data intelligence and machine learning functions.
https://www.secretflow.org.cn
Apache License 2.0
41 stars 23 forks source link

[Bug]: psi进度卡住 #111

Open randyzxr opened 3 months ago

randyzxr commented 3 months ago

Describe the bug

psi 进度条卡住很久不动,少量求交数据

日志/data/kuscia/autonomy/secretpad/alice/log/secretpad.log

2024-07-22 17:48:59 [grpc-default-executor-20] INFO o.s.s.m.integration.job.JobManager - watched jobEvent: each job status 2024-07-22 17:48:59 [grpc-default-executor-20] INFO o.s.s.m.integration.job.JobManager - watched jobEvent: kuscia status task_id: "evzq-mdkrzord-node-3" state: "Pending" alias: "evzq-mdkrzord-node-3"

2024-07-22 17:48:59 [grpc-default-executor-20] INFO o.s.s.m.integration.job.JobManager - watched jobEvent: kuscia status evzq-mdkrzord-node-3 INITIALIZED task_id: "evzq-mdkrzord-node-3" state: "Pending" alias: "evzq-mdkrzord-node-3"

2024-07-22 17:48:59 [grpc-default-executor-20] INFO o.s.s.m.integration.job.JobManager - watched jobEvent: sync result ProjectTaskDO(upk=org.secretflow.secretpad.persistence.entity.ProjectTaskDO$UPK@3c198569, parties=[bob, alice], status=INITIALIZED, errMsg=, graphNodeId=mdkrzord-node-3, graphNode=BaseAggregationRoot(id=null, isDeleted=false, gmtCreate=2024-07-22T09:48:58, gmtModified=2024-07-22T09:48:58)) 2024-07-22 17:48:59 [grpc-default-executor-20] INFO o.s.s.s.l.JobTaskLogEventListener - *** JobTaskLogEventListener evzq-mdkrzord-node-3 INITIALIZED INITIALIZED 2024-07-22 17:49:24 [http-nio-8080-exec-4] INFO o.s.s.m.i.datatable.DatatableManager - request table size=4, and response table size=4 2024-07-22 17:49:25 [http-nio-8080-exec-5] INFO o.s.s.m.i.datatable.DatatableManager - request table size=4, and response table size=4 2024-07-22 17:49:25 [http-nio-8080-exec-3] INFO o.s.s.m.i.datatable.DatatableManager - request table size=4, and response table size=4

Steps To Reproduce

项目空间-创建训练流-联合圈选 样本表选择数据-开始求交

Expected behavior

完成求交任务

Version

secretpad all in one v1.6.1b0

Operating system

centos 7.6 x86_64

Hardware Resources

8c32g

6fj commented 3 months ago

@randyzxr 需要你提供更多kusica pod的日志信息

randyzxr commented 3 months ago

还没有产生任务id:evzq的日志

screenshot-20240722-190307

john8628 commented 3 months ago

之前也遇到类似的问题,应该是看下 容器内的secretflow 的日志; 路径在/home/kuscia/var/stdout/pods/{pid}/secretflow/

randyzxr commented 3 months ago

容器重建后psi可以看到在进行中,但日志有输出报错 screenshot-20240723-160056

/data/kuscia/autonomy/secretpad/bob/log/error.log 20240723-155838 /data/kuscia/autonomy/bob/pods/bob_idho-dlwvprlc-node-3-0_1b065d2f-111b-4360-b1b7-83d04d277e49/secretflow/0.log 20240723-160023

wenkesong-li commented 3 months ago

你好,这个问题报错信息是kuscia连不上,请检查相关配置~