secretflow / kuscia

Kuscia(Kubernetes-based Secure Collaborative InfrA) is a K8s-based privacy-preserving computing task orchestration framework.
https://www.secretflow.org.cn/docs/kuscia/latest/zh-Hans
Apache License 2.0
73 stars 55 forks source link

Update run_scql_on_kuscia_cn.md #443

Closed tarantula-leo closed 4 days ago

tarantula-leo commented 1 month ago

示例中参数缺失

tarantula-leo commented 1 month ago

另在复现教程过程中发现如下问题:

  1. 最后执行SQL时,报错/home/kuscia/var/storage/data alice-sql数据找不到(kuscia容器启动后是存在的),不确定是kuscia.sh脚本还是哪个过程发生覆盖或删除,重新从容器外部导入数据后执行成功
  2. 示例中的SQL语句结果与预期不符,建议调整示例: {"status":{"code":0,"message":"","details":[]},"result":{"affected_rows":"0","warnings":[{"reason":"for safety, we filter the results for groups which contain less than 4 items."}],"cost_time_s":11.258773053,"out_columns":[{"name":"credit_rank","shape":{"dim":[{"dim_value":"0"},{"dim_value":"1"}]},"elem_type":"INT64","option":"VALUE","annotation":{"status":"TENSORSTATUS_UNKNOWN"},"int32_data":[],"int64_data":[],"float_data":[],"double_data":[],"bool_data":[],"string_data":[],"data_validity":[],"ref_num":0},{"name":"cnt","shape":{"dim":[{"dim_value":"0"},{"dim_value":"1"}]},"elem_type":"INT64","option":"VALUE","annotation":{"status":"TENSORSTATUS_UNKNOWN"},"int32_data":[],"int64_data":[],"float_data":[],"double_data":[],"bool_data":[],"string_data":[],"data_validity":[],"ref_num":0},{"name":"avg_income","shape":{"dim":[{"dim_value":"0"},{"dim_value":"1"}]},"elem_type":"FLOAT64","option":"VALUE","annotation":{"status":"TENSORSTATUS_UNKNOWN"},"int32_data":[],"int64_data":[],"float_data":[],"double_data":[],"bool_data":[],"string_data":[],"data_validity":[],"ref_num":0},{"name":"avg_amount","shape":{"dim":[{"dim_value":"0"},{"dim_value":"1"}]},"elem_type":"FLOAT64","option":"VALUE","annotation":{"status":"TENSORSTATUS_UNKNOWN"},"int32_data":[],"int64_data":[],"float_data":[],"double_data":[],"bool_data":[],"string_data":[],"data_validity":[],"ref_num":0}]}}
github-actions[bot] commented 1 week ago

Stale pull request message. Please comment to remove stale tag. Otherwise this pr will be closed soon.

tongke6 commented 5 days ago

LGTM

Thank you for your contribution!

tongke6 commented 5 days ago

另在复现教程过程中发现如下问题:

  1. 最后执行SQL时,报错/home/kuscia/var/storage/data alice-sql数据找不到(kuscia容器启动后是存在的),不确定是kuscia.sh脚本还是哪个过程发生覆盖或删除,重新从容器外部导入数据后执行成功
  2. 示例中的SQL语句结果与预期不符,建议调整示例: {"status":{"code":0,"message":"","details":[]},"result":{"affected_rows":"0","warnings":[{"reason":"for safety, we filter the results for groups which contain less than 4 items."}],"cost_time_s":11.258773053,"out_columns":[{"name":"credit_rank","shape":{"dim":[{"dim_value":"0"},{"dim_value":"1"}]},"elem_type":"INT64","option":"VALUE","annotation":{"status":"TENSORSTATUS_UNKNOWN"},"int32_data":[],"int64_data":[],"float_data":[],"double_data":[],"bool_data":[],"string_data":[],"data_validity":[],"ref_num":0},{"name":"cnt","shape":{"dim":[{"dim_value":"0"},{"dim_value":"1"}]},"elem_type":"INT64","option":"VALUE","annotation":{"status":"TENSORSTATUS_UNKNOWN"},"int32_data":[],"int64_data":[],"float_data":[],"double_data":[],"bool_data":[],"string_data":[],"data_validity":[],"ref_num":0},{"name":"avg_income","shape":{"dim":[{"dim_value":"0"},{"dim_value":"1"}]},"elem_type":"FLOAT64","option":"VALUE","annotation":{"status":"TENSORSTATUS_UNKNOWN"},"int32_data":[],"int64_data":[],"float_data":[],"double_data":[],"bool_data":[],"string_data":[],"data_validity":[],"ref_num":0},{"name":"avg_amount","shape":{"dim":[{"dim_value":"0"},{"dim_value":"1"}]},"elem_type":"FLOAT64","option":"VALUE","annotation":{"status":"TENSORSTATUS_UNKNOWN"},"int32_data":[],"int64_data":[],"float_data":[],"double_data":[],"bool_data":[],"string_data":[],"data_validity":[],"ref_num":0}]}}
  1. 有稳定复现的方法吗?有的话帮忙贴出来哈。
  2. SQL 结果本来的行的顺序没有确定性哈,除非是指定了 order by,顺序不对才是错的。这个例子里顺序没有对错,都是正确的。
tarantula-leo commented 5 days ago
  1. 好的,后续如果复现单独再提个issue
  2. 是指按教程在默认设置里对groupby进行了约束,实际执行结果中:"reason":"for safety, we filter the results for groups which contain less than 4 items.,但文档中有结果
tongke6 commented 5 days ago

2

这个 warning 只是一个提醒,意思是有可能会过滤,是否会过滤取决于是否有分组的数量小于4。在这个 case 里只是刚好都 > 4 了。