Open coderSun20201112 opened 4 months ago
你的数据里面包含重复数据吗
你的数据里面包含重复数据吗
好的,我检查一下数据,我问问业务部门
你的数据里面包含重复数据吗
我新造了1000条测试数据,其中交集是560条,且这560条记录的"身份证号码"各不相同,而我也是用“身份证号码”作为求交列,即便这样,还是失败
好的, 失败日志和上面相同吗。
好的, 失败日志和上面相同吗。
相同
好的, 失败日志和上面相同吗。
相同
是否有更多的任务日志信息。可以在kuscia容器内,/home/kuscia/var/stdout/路径下找到报错任务id的日志
好的, 失败日志和上面相同吗。
相同
是否有更多的任务日志信息。可以在kuscia容器内,/home/kuscia/var/stdout/路径下找到报错任务id的日志
基于RR22做了一次测试,下面是日志信息:
pod下的日志 2024-07-10T18:25:45.296799503+08:00 stdout F [2024-07-10 18:25:45.281] [info] [main.cc:44] SecretFlow PSI Library v0.2.0.dev240123 Copyright 2023 Ant Group Co., Ltd. 2024-07-10T18:25:45.299321156+08:00 stdout F [2024-07-10 18:25:45.299] [info] [main.cc:56] Kuscia task id: yqxxeraj 2024-07-10T18:25:45.317512483+08:00 stderr F I0710 18:25:45.317143 7 external/com_github_brpc_brpc/src/brpc/server.cpp:1158] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=54509. 2024-07-10T18:25:45.317571852+08:00 stderr F W0710 18:25:45.317178 7 external/com_github_brpc_brpc/src/brpc/server.cpp:1164] Builtin services are disabled according to ServerOptions.has_builtin_services 2024-07-10T18:25:48.547728713+08:00 stderr F I0710 18:25:48.547527 26 external/com_github_brpc_brpc/src/brpc/span.cpp:506] Opened ./rpc_data/rpcz/20240710.182548.7/id.db and ./rpc_data/rpcz/20240710.182548.7/time.db 2024-07-10T18:25:51.363771015+08:00 stderr F [978.334] perfetto.cc:45899 Configured tracing session 1, #sources:1, duration:0 ms, #buffers:1, total buffer size:1024 KB, total sessions:1, uid:0 session name: "" 2024-07-10T18:25:51.364221936+08:00 stdout F [2024-07-10 18:25:51.364] [info] [launch.cc:115] PSI config: {"protocol_config":{"protocol":"PROTOCOL_RR22","role":"ROLE_SENDER","ecdh_config":{"curve":"CURVE_FOURQ"},"kkrt_config":{"bucket_size":"1048576"},"rr22_config":{"bucket_size":"1048576"}},"input_config":{"type":"IO_TYPE_FILE_CSV","path":"/home/kuscia/var/storage/data/learn_440_1980-01-01.csv"},"output_config":{"type":"IO_TYPE_FILE_CSV","path":"/home/kuscia/var/storage/data/result/yqxxeraj/"},"keys":["证件号码"],"recovery_config":{"enabled":true,"folder":"/home/kuscia/var/storage/data/tmp/yqxxeraj/"},"left_side":"ROLE_RECEIVER"} 2024-07-10T18:25:51.364241907+08:00 stdout F [2024-07-10 18:25:51.364] [info] [sender.cc:35] [Rr22PsiSender::Init] start 2024-07-10T18:25:51.364248729+08:00 stdout F [2024-07-10 18:25:51.364] [info] [interface.cc:76] [AbstractPsiParty::Init] start 2024-07-10T18:25:51.364255072+08:00 stdout F [2024-07-10 18:25:51.364] [warning] [interface.cc:300] check_hash_digest turns off while recovery is enabled. check_hash_digest is modified to true for robustness. 2024-07-10T18:25:51.371614123+08:00 stdout F [2024-07-10 18:25:51.371] [info] [interface.cc:134] [AbstractPsiParty::Init][Check csv pre-process] start 2024-07-10T18:25:51.379577942+08:00 stdout F [2024-07-10 18:25:51.379] [info] [csv_checker.cc:241] Executing script to get duplicates: LC_ALL=C tail -n +2 /tmp/f4dd1be1-c6bb-4781-9feb-eb7db92270c5.psi_checked | LC_ALL=C sort --parallel=8 --buffer-size=1G --stable | LC_ALL=C uniq -d > /tmp/f4dd1be1-c6bb-4781-9feb-eb7db92270c5.psi_checked_duplicates 2024-07-10T18:25:51.414585957+08:00 stdout F [2024-07-10 18:25:51.414] [info] [csv_checker.cc:271] Executing script to get hash digest: sha256sum /tmp/f4dd1be1-c6bb-4781-9feb-eb7db92270c5.psi_checked 2024-07-10T18:25:51.428806196+08:00 stdout F [2024-07-10 18:25:51.428] [info] [interface.cc:143] [AbstractPsiParty::Init][Check csv pre-process] end 2024-07-10T18:25:51.433757927+08:00 stdout F [2024-07-10 18:25:51.433] [info] [interface.cc:183] [AbstractPsiParty::Init] end 2024-07-10T18:25:51.434165661+08:00 stdout F [2024-07-10 18:25:51.433] [info] [sender.cc:40] [Rr22PsiSender::Init] end 2024-07-10T18:25:51.434179781+08:00 stdout F [2024-07-10 18:25:51.434] [info] [sender.cc:45] [Rr22PsiSender::PreProcess] start 2024-07-10T18:25:51.434198772+08:00 stdout F [2024-07-10 18:25:51.434] [info] [bucket_psi.cc:515] psi protocol=3, rank=0 item_size=1000 2024-07-10T18:25:51.434205583+08:00 stdout F [2024-07-10 18:25:51.434] [info] [bucket_psi.cc:515] psi protocol=3, rank=1 item_size=1000 2024-07-10T18:25:51.436311717+08:00 stdout F [2024-07-10 18:25:51.436] [info] [arrow_csv_batch_provider.cc:51] Reach the end of csv file /home/kuscia/var/storage/data/learn_440_1980-01-01.csv. 2024-07-10T18:25:51.436942769+08:00 stdout F [2024-07-10 18:25:51.436] [info] [arrow_csv_batch_provider.cc:51] Reach the end of csv file /home/kuscia/var/storage/data/learn_440_1980-01-01.csv. 2024-07-10T18:25:51.439140404+08:00 stdout F [2024-07-10 18:25:51.439] [info] [sender.cc:79] [Rr22PsiSender::PreProcess] end 2024-07-10T18:25:51.441284522+08:00 stdout F [2024-07-10 18:25:51.441] [info] [sender.cc:84] [Rr22PsiSender::Online] start 2024-07-10T18:25:51.442326142+08:00 stdout F [2024-07-10 18:25:51.441] [info] [recovery.cc:188] RecoveryManager::MarkOnlineStart ecdh_dual_masked_cnt_frompeer = 0 2024-07-10T18:25:51.442357509+08:00 stdout F [2024-07-10 18:25:51.441] [info] [recovery.cc:192] RecoveryManager::MarkOnlineStart parsed_bucket_count_frompeer = 0 2024-07-10T18:25:51.446471601+08:00 stdout F [2024-07-10 18:25:51.446] [info] [bucket.cc:37] psi protocol=3, rank=0, inputs_size=1000 2024-07-10T18:25:51.446489556+08:00 stdout F [2024-07-10 18:25:51.446] [info] [bucket.cc:37] psi protocol=3, rank=1, inputs_size=1000 2024-07-10T18:25:51.44651467+08:00 stdout F [2024-07-10 18:25:51.446] [info] [bucket.cc:50] run psi bucket_idx=0, bucket_item_size=1000 2024-07-10T18:25:51.448829406+08:00 stdout F [2024-07-10 18:25:51.448] [info] [thread_pool.cc:30] Create a fixed thread pool with size 7 2024-07-10T18:25:51.45112501+08:00 stdout F [2024-07-10 18:25:51.450] [info] [rr22_oprf.cc:139] recv paxos seed... 2024-07-10T18:25:51.456876126+08:00 stdout F [2024-07-10 18:25:51.456] [info] [rr22_oprf.cc:145] recv paxos seed finished 2024-07-10T18:25:51.4569054+08:00 stdout F [2024-07-10 18:25:51.456] [info] [rr22_oprf.cc:176] begin vole send
pod配置信息
[root@root-kuscia-autonomy-renhang kuscia]# kubectl describe pods yqxxeraj-0 --namespace=renhang
Name: yqxxeraj-0
Namespace: renhang
Priority: 0
Service Account: default
Node: root-kuscia-autonomy-renhang/172.18.0.3
Start Time: Wed, 10 Jul 2024 18:25:42 +0800
Labels: kuscia.secretflow/communication-role-client=true
kuscia.secretflow/communication-role-server=true
kuscia.secretflow/controller=kusciatask
kuscia.secretflow/initiator=renhang
kuscia.secretflow/interconn-protocol-type=kuscia
kuscia.secretflow/task-id=yqxxeraj
kuscia.secretflow/task-resource=yqxxeraj-d74ad9f504a9
kuscia.secretflow/task-resource-group=yqxxeraj
task.kuscia.secretflow/pod-name=yqxxeraj-0
task.kuscia.secretflow/pod-role=
Annotations: kuscia.secretflow/config-template-volumes: config-template
kuscia.secretflow/image-id: sha256:ae331537eb75b273358b63a7b67d7aa80c190888cb38064360db5e60b6540b15
kuscia.secretflow/taskresource-reserving-timestamp: 2024-07-10T18:25:42+08:00
Status: Failed
IP:
IPs:
Exit Code: 132
Started: Wed, 10 Jul 2024 18:25:45 +0800
Finished: Wed, 10 Jul 2024 18:25:51 +0800
Ready: False
Restart Count: 0
Environment:
TASK_ID: yqxxeraj
TASK_CLUSTER_DEFINE: {"parties":[{"name":"shanghang", "role":"", "services":[{"portName":"psi", "endpoints":["yqxxeraj-0-psi.shanghang.svc"]}]}, {"name":"renhang", "role":"", "services":[{"portName":"psi", "endpoints":["yqxxeraj-0-psi.renhang.svc"]}]}], "selfPartyIdx":1, "selfEndpointIdx":0}
ALLOCATED_PORTS: {"ports":[{"name":"psi", "port":54509, "scope":"Cluster", "protocol":"HTTP"}]}
TASK_INPUT_CONFIG: {
"sf_psi_config_map": {
"shanghang": {
"link_config": {
"recv_timeout_ms": "30000",
"http_timeout_ms": 30000
},
"psi_config": {
"protocol_config": {
"protocol": "PROTOCOL_RR22",
"role": "ROLE_RECEIVER",
"ecdh_config": {
"curve": "CURVE_FOURQ"
},
"kkrt_config": {
"bucket_size": "1048576"
},
"rr22_config": {
"bucket_size": "1048576"
}
},
"input_config": {
"type": "IO_TYPE_FILE_CSV",
"path": "/home/kuscia/var/storage/data/learn_440_1970-01-01.csv"
},
"output_config": {
"type": "IO_TYPE_FILE_CSV",
"path": "/home/kuscia/var/storage/data/result/yqxxeraj/result2.csv"
},
"keys": ["证件号码"],
"recovery_config": {
"enabled": true,
"folder": "/home/kuscia/var/storage/data/tmp/yqxxeraj/"
},
"left_side": "ROLE_RECEIVER"
}
},
"renhang": {
"link_config": {
"recv_timeout_ms": "30000",
"http_timeout_ms": 30000
},
"psi_config": {
"protocol_config": {
"protocol": "PROTOCOL_RR22",
"role": "ROLE_SENDER",
"ecdh_config": {
"curve": "CURVE_FOURQ"
},
"kkrt_config": {
"bucket_size": "1048576"
},
"rr22_config": {
"bucket_size": "1048576"
}
},
"input_config": {
"type": "IO_TYPE_FILE_CSV",
"path": "/home/kuscia/var/storage/data/learn_440_1980-01-01.csv"
},
"output_config": {
"type": "IO_TYPE_FILE_CSV",
"path": "/home/kuscia/var/storage/data/result/yqxxeraj/"
},
"keys": ["证件号码"],
"recovery_config": {
"enabled": true,
"folder": "/home/kuscia/var/storage/data/tmp/yqxxeraj/"
},
"left_side": "ROLE_RECEIVER"
}
}
}
}
Mounts:
/etc/kuscia/task-config.conf from config-template (rw,path="task-config.conf")
Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-template: Type: ConfigMap (a volume populated by a ConfigMap) Name: yqxxeraj-configtemplate Optional: false QoS Class: BestEffort Node-Selectors: kuscia.secretflow/namespace=renhang Tolerations: kuscia.secretflow/agent:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message
Warning FailedScheduling 3m16s kuscia-scheduler 0/1 nodes are available: failed to get task resource renhang/ for pod. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod., can not find related task resource. Normal Scheduled 3m14s kuscia-scheduler Successfully assigned renhang/yqxxeraj-0 to root-kuscia-autonomy-renhang Normal Pulled 3m14s Agent Container image "secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/psi-anolis8:0.2.0.dev240123" already present on machine Normal Created 3m13s Agent Created container secretflow Normal Started 3m12s Agent Started container secretflow Warning MissingClusterDNS 3m11s (x4 over 3m15s) Agent pod: "yqxxeraj-0_renhang(0d848060-9ff7-42f3-9470-ce5c66fc3454)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy. [root@root-kuscia-autonomy-renhang kuscia]#
我看历史issues中,有人提到avx、avx2,是不是对CPU有要求?
你好,avx、avx2需要cpu对avx指令集支持~
你好,avx、avx2需要cpu对avx指令集支持~
那我KKRT/RR22执行失败,通过日志能看出是因为我方服务器不支持avx/avx2吗?如果不是avx/avx2的问题,那我该如何解决这个问题?
Stale issue message. Please comment to remove stale tag. Otherwise this issue will be closed soon.
Issue Type
Others
Search for existing issues similar to yours
Yes
Kuscia Version
kuscia 0.5.0
Link to Relevant Documentation
No response
Question Details