secretflow / secretflow

A unified framework for privacy-preserving data analysis and machine learning
https://www.secretflow.org.cn/docs/secretflow/en/
Apache License 2.0
2.34k stars 388 forks source link

分组统计组件调用过程中的aggregation_config如何配置,类型是object,在attrs中的用什么类型 #1405

Open gaodongdong-software opened 2 months ago

gaodongdong-software commented 2 months ago

image

gaodongdong-software commented 2 months ago

input/input_data/by 添加字符串数据后后,会报这个错误,查找不到,明明文件里面存在相关列名

image

BrainWH commented 2 months ago

你好,你是用哪种方式调用的分组统计?可以补充一下上下文信息吗?

gaodongdong-software commented 2 months ago

image 直接在kuscia中创建task任务,使用的输入数据是系统测试psi隐私求交的结果,psi_output.csv

BrainWH commented 2 months ago

你好,方便的话可以把截图的内容贴出来,方便定位

gaodongdong-software commented 2 months ago

export CTR_CERTS_ROOT=/home/kuscia/var/certs curl -k -X POST 'https://localhost:8082/api/v1/job/create' \ --header "Token: $(cat ${CTR_CERTS_ROOT}/token)" \ --header 'Content-Type: application/json' \ --cert ${CTR_CERTS_ROOT}/kusciaapi-server.crt \ --key ${CTR_CERTS_ROOT}/kusciaapi-server.key \ --cacert ${CTR_CERTS_ROOT}/ca.crt \ -d '{ "job_id": "job-alice-groupby", "initiator": "alice", "max_parallelism": 2, "tasks": [ { "task_id": "job-groupby-statistics", "app_image": "secretflow-image", "parties": [ { "domain_id": "alice", "role": "partner" }, { "domain_id": "bob", "role": "partner" } ], "alias": "job-table-statistics", "dependencies": [], "task_input_config": "{\"sf_datasource_config\":{\"alice\":{\"id\":\"default-data-source\"},\"bob\":{\"id\":\"default-data-source\"}},\"sf_cluster_desc\":{\"parties\":[\"alice\",\"bob\"],\"devices\":[{\"name\":\"spu\",\"type\":\"spu\",\"parties\":[\"alice\",\"bob\"],\"config\":\"{\\"runtime_config\\":{\\"protocol\\":\\"REF2K\\",\\"field\\":\\"FM64\\"},\\"link_desc\\":{\\"connect_retry_times\\":60,\\"connect_retry_interval_ms\\":1000,\\"brpc_channel_protocol\\":\\"http\\",\\"brpc_channel_connection_type\\":\\"pooled\\",\\"recv_timeout_ms\\":1200000,\\"http_timeout_ms\\":1200000}}\"},{\"name\":\"heu\",\"type\":\"heu\",\"parties\":[\"alice\",\"bob\"],\"config\":\"{\\"mode\\": \\"PHEU\\", \\"schema\\": \\"paillier\\", \\"key_size\\": 2048}\"}],\"ray_fed_config\":{\"cross_silo_comm_backend\":\"brpc_link\"}},\"sf_node_eval_param\":{\"domain\":\"stats\",\"name\":\"groupby_statistics\",\"version\":\"0.0.3\",\"attr_paths\":[\"aggregation_config\",\"input/input_data/by\"],\"attrs\":[{\"s\":\"{\\"column_queries\\":[{\\"column_name\\":\\"age\\",\\"function\\":1},{\\"column_name\\":\\"contact_cellular\\",\\"function\\":1}]}\",\"ss\":[\"balance\", \"month_aug\"]}]},\"sf_input_ids\":[\"psi-output\"],\"sf_output_ids\":[\"groupby-output\"],\"sf_output_uris\":[\"\"]}", "priority": 100 } ] }'

echo -e "\n"

BrainWH commented 2 months ago

你好,"attrs":[{"s":"{\"column_queries\":[{\"column_name\":\"age\",\"function\":1},{\"column_name\":\"contact_cellular\",\""attrs":[{"s":"{\"column_queries\":[{\"column_name\":\"age\",\"function\":1},{\"column_name\":\"contact_cellular\",\"function\":1}]\":1}] 里面的"function\" 的值给的不对。你可以根据以下值根据需求调整:COUNT、SUM、MEAN、MIN、MAX、VAR

ENUM_TO_STR = { ColumnQuery.AggregationFunction.COUNT: "count", ColumnQuery.AggregationFunction.SUM: "sum", ColumnQuery.AggregationFunction.MEAN: "mean", ColumnQuery.AggregationFunction.MIN: "min", ColumnQuery.AggregationFunction.MAX: "max", ColumnQuery.AggregationFunction.VAR: "var", }

gaodongdong-software commented 2 months ago

将function内容进行调整后,但是报了另外一个错误 image

这是调用的脚本文件内容 export CTR_CERTS_ROOT=/home/kuscia/var/certs curl -k -X POST 'https://localhost:8082/api/v1/job/create' \ --header "Token: $(cat ${CTR_CERTS_ROOT}/token)" \ --header 'Content-Type: application/json' \ --cert ${CTR_CERTS_ROOT}/kusciaapi-server.crt \ --key ${CTR_CERTS_ROOT}/kusciaapi-server.key \ --cacert ${CTR_CERTS_ROOT}/ca.crt \ -d '{ "job_id": "job-alice-groupby", "initiator": "alice", "max_parallelism": 2, "tasks": [ { "task_id": "job-groupby-statistics", "app_image": "secretflow-image", "parties": [ { "domain_id": "alice", "role": "partner" }, { "domain_id": "bob", "role": "partner" } ], "alias": "job-table-statistics", "dependencies": [], "task_input_config": "{\"sf_datasource_config\":{\"alice\":{\"id\":\"default-data-source\"},\"bob\":{\"id\":\"default-data-source\"}},\"sf_cluster_desc\":{\"parties\":[\"alice\",\"bob\"],\"devices\":[{\"name\":\"spu\",\"type\":\"spu\",\"parties\":[\"alice\",\"bob\"],\"config\":\"{\\"runtime_config\\":{\\"protocol\\":\\"REF2K\\",\\"field\\":\\"FM64\\"},\\"link_desc\\":{\\"connect_retry_times\\":60,\\"connect_retry_interval_ms\\":1000,\\"brpc_channel_protocol\\":\\"http\\",\\"brpc_channel_connection_type\\":\\"pooled\\",\\"recv_timeout_ms\\":1200000,\\"http_timeout_ms\\":1200000}}\"},{\"name\":\"heu\",\"type\":\"heu\",\"parties\":[\"alice\",\"bob\"],\"config\":\"{\\"mode\\": \\"PHEU\\", \\"schema\\": \\"paillier\\", \\"key_size\\": 2048}\"}],\"ray_fed_config\":{\"cross_silo_comm_backend\":\"brpc_link\"}},\"sf_node_eval_param\":{\"domain\":\"stats\",\"name\":\"groupby_statistics\",\"version\":\"0.0.3\",\"attr_paths\":[\"aggregation_config\",\"input/input_data/by\"],\"attrs\":[{\"s\":\"{\\"column_queries\\":[{\\"column_name\\":\\"age\\",\\"function\\":\\"MIN\\"},{\\"column_name\\":\\"contact_cellular\\",\\"function\\":\\"MIN\\"}]}\",\"ss\":[\"id1\",\"id2\"]}]},\"sf_input_ids\":[\"psi-output\"],\"sf_output_ids\":[\"groupby-output\"],\"sf_output_uris\":[\"\"]}", "priority": 100 } ] }'

echo -e "\n"

BrainWH commented 2 months ago

你好,看着目前的报错信息,应该是入参的数量不对,你可以这么做: 明密文对比一下,在pad上跑一下,然后在kuscia容器内部日志中看一下入参数量,和你的脚本内容对比一下