secretflow / secretpad

SecretPad is a privacy-preserving computing web platform based on the Kuscia framework, designed to provide easy access to privacy-preserving data intelligence and machine learning functions.
https://www.secretflow.org.cn
Apache License 2.0
42 stars 23 forks source link

P2P部署模式 随机分割执行报错 #166

Closed Inclay97 closed 1 day ago

Inclay97 commented 6 days ago

Issue Type

Running

Have you searched for existing documents and issues?

Yes

OS Platform and Distribution

Linux version 4.19.90-52.40.v2207.ky10.x86_64 (KYLINSOFT@f4c0be98561f) (gcc version 7.3.0 (GCC)) #3 SMP Wed Jul 24 15:07:09 CST 2024

All_in_one Version

SecretPad All In One v1.9.0

Module type

secretflow

Module version

SecretPad All In One v1.9.0

What happend and What you expected to happen.

对同节点的两个表union后进行随机分割,训练子集大小为0.8 测试子集大小为0.2
随后执行报错,AssertionError

Log output.

2024-11-21 14:41:43 INFO the jobId=htot, taskId=htot-mglryfmg-node-46 start ...
2024-11-21 14:41:51 INFO the jobId=htot, taskId=htot-mglryfmg-node-46 failed: party ma failed msg: container[secretflow] terminated state reason "Error", message: "Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.
2024-11-21 06:41:45,714|ma|INFO|secretflow|entry.py:start_ray:60| ray_conf: RayConfig(ray_node_ip_address='htot-mglryfmg-node-46-0-global.ma.svc', ray_node_manager_port=26244, ray_object_manager_port=26245, ray_client_server_port=26246, ray_worker_ports=[], ray_min_worker_port=10012, ray_max_worker_port=10112, ray_gcs_port=26243)
2024-11-21 06:41:45,714|ma|INFO|secretflow|entry.py:start_ray:68| Trying to start ray head node at htot-mglryfmg-node-46-0-global.ma.svc, start command: ray start --head --include-dashboard=false --disable-usage-stats --num-cpus=32 --node-ip-address=htot-mglryfmg-node-46-0-global.ma.svc --port=26243 --node-manager-port=26244 --object-manager-port=26245 --ray-client-server-port=26246 --min-worker-port=10012 --max-worker-port=10112
2024-11-21 06:41:49,407|ma|INFO|secretflow|entry.py:start_ray:81| 2024-11-21 06:41:46,315\tINFO usage_lib.py:423 -- Usage stats collection is disabled.
2024-11-21 06:41:46,315\tINFO scripts.py:744 -- Local node IP: htot-mglryfmg-node-46-0-global.ma.svc
2024-11-21 06:41:49,274\tSUCC scripts.py:781 -- --------------------
2024-11-21 06:41:49,274\tSUCC scripts.py:782 -- Ray runtime started.
2024-11-21 06:41:49,274\tSUCC scripts.py:783 -- --------------------
2024-11-21 06:41:49,274\tINFO scripts.py:785 -- Next steps
2024-11-21 06:41:49,275\tINFO scripts.py:788 -- To add another node to this Ray cluster, run
2024-11-21 06:41:49,275\tINFO scripts.py:791 --   ray start --address='htot-mglryfmg-node-46-0-global.ma.svc:26243'
2024-11-21 06:41:49,275\tINFO scripts.py:800 -- To connect to this Ray cluster:
2024-11-21 06:41:49,275\tINFO scripts.py:802 -- import ray
2024-11-21 06:41:49,275\tINFO scripts.py:803 -- ray.init(_node_ip_address='htot-mglryfmg-node-46-0-global.ma.svc')
2024-11-21 06:41:49,275\tINFO scripts.py:834 -- To terminate the Ray runtime, run
2024-11-21 06:41:49,275\tINFO scripts.py:835 --   ray stop
2024-11-21 06:41:49,275\tINFO scripts.py:838 -- To view the status of the cluster, use
2024-11-21 06:41:49,275\tINFO scripts.py:839 --   ray status

2024-11-21 06:41:49,407|ma|INFO|secretflow|entry.py:start_ray:82| Succeeded to start ray head node at htot-mglryfmg-node-46-0-global.ma.svc.
2024-11-21 06:41:49,408|ma|INFO|secretflow|entry.py:main:557| datasource.access_directly False
sf_node_eval_param  {
  \"domain\": \"data_prep\",
  \"name\": \"train_test_split\",
  \"version\": \"0.0.1\",
  \"attrPaths\": [
    \"train_size\",
    \"test_size\",
    \"random_state\",
    \"shuffle\"
  ],
  \"attrs\": [
    {
      \"f\": 0.8
    },
    {
      \"f\": 0.2
    },
    {
      \"i64\": \"1024\"
    },
    {
      \"b\": true
    }
  ],
  \"checkpointUri\": \"ckhtot-mglryfmg-node-46-output-0\"
} 
2024-11-21 06:41:49,427|ma|ERROR|secretflow|entry.py:<module>:601| unexpected exception
Traceback (most recent call last):
  File \"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\", line 599, in <module>
    main()
  File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 1157, in __call__
    return self.main(*args, **kwargs)
  File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 1078, in main
    rv = self.invoke(ctx)
  File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 783, in invoke
    return __callback(*args, **kwargs)
  File \"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\", line 561, in main
    sf_node_eval_param = preprocess_sf_node_eval_param(
  File \"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\", line 328, in preprocess_sf_node_eval_param
    domaindata_id_to_dist_data(
  File \"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\", line 180, in domaindata_id_to_dist_data
    assert dist_data.type in set(input_def.types)
AssertionError
"
zimu-yuxi commented 6 days ago

方便看下您画布上任务流是什么样子的吗?

Inclay97 commented 6 days ago

image

Inclay97 commented 6 days ago

htot_mglryfmg_node_40_output_0.csv 这个是union后的结果

zimu-yuxi commented 6 days ago

进kuscia容器里,根据任务id找下任务执行日志,路径在home/kuscia/var/stdout/任务id/pod/secretflow/0.log

Inclay97 commented 6 days ago

2024-11-21T14:41:42.952145357+08:00 stderr F Since the GPL-licensed package unidecode is not installed, using Python's unicodedata package which yields worse results. 2024-11-21T14:41:45.714927847+08:00 stdout F 2024-11-21 06:41:45,714|ma|INFO|secretflow|entry.py:start_ray:60| ray_conf: RayConfig(ray_node_ip_address='htot-mglryfmg-node-46-0-global.ma.svc', ray_node_manager_port=26244, ray_object_manager_port=26245, ray_client_server_port=26246, ray_worker_ports=[], ray_min_worker_port=10012, ray_max_worker_port=10112, ray_gcs_port=26243) 2024-11-21T14:41:45.71508624+08:00 stdout F 2024-11-21 06:41:45,714|ma|INFO|secretflow|entry.py:start_ray:68| Trying to start ray head node at htot-mglryfmg-node-46-0-global.ma.svc, start command: ray start --head --include-dashboard=false --disable-usage-stats --num-cpus=32 --node-ip-address=htot-mglryfmg-node-46-0-global.ma.svc --port=26243 --node-manager-port=26244 --object-manager-port=26245 --ray-client-server-port=26246 --min-worker-port=10012 --max-worker-port=10112 2024-11-21T14:41:49.409253462+08:00 stdout F 2024-11-21 06:41:49,407|ma|INFO|secretflow|entry.py:start_ray:81| 2024-11-21 06:41:46,315 INFO usage_lib.py:423 -- Usage stats collection is disabled. 2024-11-21T14:41:49.409301472+08:00 stdout F 2024-11-21 06:41:46,315 INFO scripts.py:744 -- Local node IP: htot-mglryfmg-node-46-0-global.ma.svc 2024-11-21T14:41:49.409311447+08:00 stdout F 2024-11-21 06:41:49,274 SUCC scripts.py:781 -- -------------------- 2024-11-21T14:41:49.409316973+08:00 stdout F 2024-11-21 06:41:49,274 SUCC scripts.py:782 -- Ray runtime started. 2024-11-21T14:41:49.409321927+08:00 stdout F 2024-11-21 06:41:49,274 SUCC scripts.py:783 -- -------------------- 2024-11-21T14:41:49.409329365+08:00 stdout F 2024-11-21 06:41:49,274 INFO scripts.py:785 -- Next steps 2024-11-21T14:41:49.409336787+08:00 stdout F 2024-11-21 06:41:49,275 INFO scripts.py:788 -- To add another node to this Ray cluster, run 2024-11-21T14:41:49.409344569+08:00 stdout F 2024-11-21 06:41:49,275 INFO scripts.py:791 -- ray start --address='htot-mglryfmg-node-46-0-global.ma.svc:26243' 2024-11-21T14:41:49.409349654+08:00 stdout F 2024-11-21 06:41:49,275 INFO scripts.py:800 -- To connect to this Ray cluster: 2024-11-21T14:41:49.409354999+08:00 stdout F 2024-11-21 06:41:49,275 INFO scripts.py:802 -- import ray 2024-11-21T14:41:49.409360342+08:00 stdout F 2024-11-21 06:41:49,275 INFO scripts.py:803 -- ray.init(_node_ip_address='htot-mglryfmg-node-46-0-global.ma.svc') 2024-11-21T14:41:49.40936588+08:00 stdout F 2024-11-21 06:41:49,275 INFO scripts.py:834 -- To terminate the Ray runtime, run 2024-11-21T14:41:49.409370959+08:00 stdout F 2024-11-21 06:41:49,275 INFO scripts.py:835 -- ray stop 2024-11-21T14:41:49.409392519+08:00 stdout F 2024-11-21 06:41:49,275 INFO scripts.py:838 -- To view the status of the cluster, use 2024-11-21T14:41:49.40939771+08:00 stdout F 2024-11-21 06:41:49,275 INFO scripts.py:839 -- ray status 2024-11-21T14:41:49.409402454+08:00 stdout F 2024-11-21T14:41:49.409411626+08:00 stdout F 2024-11-21 06:41:49,407|ma|INFO|secretflow|entry.py:start_ray:82| Succeeded to start ray head node at htot-mglryfmg-node-46-0-global.ma.svc. 2024-11-21T14:41:49.409722151+08:00 stdout F 2024-11-21 06:41:49,408|ma|INFO|secretflow|entry.py:main:557| datasource.access_directly False 2024-11-21T14:41:49.409764447+08:00 stdout F sf_node_eval_param { 2024-11-21T14:41:49.409770769+08:00 stdout F "domain": "data_prep", 2024-11-21T14:41:49.409781406+08:00 stdout F "name": "train_test_split", 2024-11-21T14:41:49.40978639+08:00 stdout F "version": "0.0.1", 2024-11-21T14:41:49.409793234+08:00 stdout F "attrPaths": [ 2024-11-21T14:41:49.409798559+08:00 stdout F "train_size", 2024-11-21T14:41:49.409803278+08:00 stdout F "test_size", 2024-11-21T14:41:49.409807953+08:00 stdout F "random_state", 2024-11-21T14:41:49.409812773+08:00 stdout F "shuffle" 2024-11-21T14:41:49.409817523+08:00 stdout F ], 2024-11-21T14:41:49.40982254+08:00 stdout F "attrs": [ 2024-11-21T14:41:49.409827479+08:00 stdout F { 2024-11-21T14:41:49.409832434+08:00 stdout F "f": 0.8 2024-11-21T14:41:49.409857036+08:00 stdout F }, 2024-11-21T14:41:49.409862408+08:00 stdout F { 2024-11-21T14:41:49.409867392+08:00 stdout F "f": 0.2 2024-11-21T14:41:49.409872453+08:00 stdout F }, 2024-11-21T14:41:49.409878006+08:00 stdout F { 2024-11-21T14:41:49.40988297+08:00 stdout F "i64": "1024" 2024-11-21T14:41:49.409887701+08:00 stdout F }, 2024-11-21T14:41:49.409892621+08:00 stdout F { 2024-11-21T14:41:49.409897327+08:00 stdout F "b": true 2024-11-21T14:41:49.409902087+08:00 stdout F } 2024-11-21T14:41:49.409906778+08:00 stdout F ], 2024-11-21T14:41:49.409912434+08:00 stdout F "checkpointUri": "ckhtot-mglryfmg-node-46-output-0" 2024-11-21T14:41:49.409917282+08:00 stdout F } 2024-11-21T14:41:49.431064135+08:00 stdout F 2024-11-21 06:41:49,427|ma|ERROR|secretflow|entry.py::601| unexpected exception 2024-11-21T14:41:49.431090315+08:00 stdout F Traceback (most recent call last): 2024-11-21T14:41:49.431097017+08:00 stdout F File "/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py", line 599, in 2024-11-21T14:41:49.43110232+08:00 stdout F main() 2024-11-21T14:41:49.431107349+08:00 stdout F File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in call 2024-11-21T14:41:49.431112673+08:00 stdout F return self.main(args, kwargs) 2024-11-21T14:41:49.431117545+08:00 stdout F File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main 2024-11-21T14:41:49.43112321+08:00 stdout F rv = self.invoke(ctx) 2024-11-21T14:41:49.431128077+08:00 stdout F File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke 2024-11-21T14:41:49.431133618+08:00 stdout F return ctx.invoke(self.callback, ctx.params) 2024-11-21T14:41:49.431159282+08:00 stdout F File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke 2024-11-21T14:41:49.431165164+08:00 stdout F return __callback(args, **kwargs) 2024-11-21T14:41:49.431169923+08:00 stdout F File "/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py", line 561, in main 2024-11-21T14:41:49.431174859+08:00 stdout F sf_node_eval_param = preprocess_sf_node_eval_param( 2024-11-21T14:41:49.431180758+08:00 stdout F File "/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py", line 328, in preprocess_sf_node_eval_param 2024-11-21T14:41:49.431185778+08:00 stdout F domaindata_id_to_dist_data( 2024-11-21T14:41:49.43119054+08:00 stdout F File "/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py", line 180, in domaindata_id_to_dist_data 2024-11-21T14:41:49.431195645+08:00 stdout F assert dist_data.type in set(input_def.types) 2024-11-21T14:41:49.431200436+08:00 stdout F AssertionError

0.log日志内容 似乎和平台报错日志一致

zimu-yuxi commented 6 days ago

不能这样使用,union表组件输出的表类型是individual,而随机分割需要vertical_table

zimu-yuxi commented 6 days ago

建议使用最新版的allinone包,报错信息会有一定优化,您可以很清晰的从报错信息中获取报错原因

Inclay97 commented 6 days ago

不能这样使用,union表组件输出的表类型是individual,而随机分割需要vertical_table

那是只能先隐私求交后再随机分割吗

zimu-yuxi commented 6 days ago

是的。并且您表union之后再做随机分割的目的是啥呢?

Inclay97 commented 6 days ago

明白了,感谢。

Inclay97 commented 6 days ago

建议使用最新版的allinone包,报错信息会有一定优化,您可以很清晰的从报错信息中获取报错原因

那这是需要下载最新版的allinone包 重新部署吗?有其他更新方法吗

zimu-yuxi commented 5 days ago

建议使用最新版的allinone包,报错信息会有一定优化,您可以很清晰的从报错信息中获取报错原因

那这是需要下载最新版的allinone包 重新部署吗?有其他更新方法吗

如果是本地体验使用的情况,建议下载最新版allinone包重新部署