secretflow / psi

The repo of Private Set Intersection(PSI) and Private Information Retrieval(PIR) from SecretFlow.
https://www.secretflow.org.cn/docs/psi
Apache License 2.0
21 stars 16 forks source link

[Bug]: RR22_LOWCOMM_PSI_2PC对抖动高敏感 #137

Open Fissure45 opened 1 month ago

Fissure45 commented 1 month ago

Issue Type

Usability

Modules Involved

PSI

Have you reproduced the bug with SPU HEAD?

No

Have you searched existing issues?

Yes

SPU Version

spu 0.8.0b0

OS Platform and Distribution

Centos

Python Version

3.9

Compiler Version

No response

Current Behavior?

在测试有限网络环境下不同PSI算法的表现时,我发现RR22_LOWCOMM似乎对抖动敏感。在10Mbps下,180ms延迟+45ms抖动会使求交任务链接建立但数据传输出现问题。这是实现问题还是算法局限? 我会进一步对不同的延迟和抖动进行测试。

Standalone code to reproduce the issue

如上

Relevant log output

如上
6fj commented 1 month ago

hi @Fissure45

你说的抖动是指最高延迟会到225ms的意思吗?可以放一下错误日志吗?感谢。

6fj commented 1 month ago

可以贴一下两边完整的log吗,感谢

Fissure45 commented 1 month ago

接收方 [2024-05-30 14:52:07.199] [info] [launch.cc:164] LEGACY PSI config: {"psi_type":"RR22_LOWCOMM_PSI_2PC","receiver_rank":1,"broadcast_result":true,"input_params":{"path":"/opt/1000w.csv","select_fields":["id"]},"output_params":{"path":"/opt/tmp/1000w.csv","need_sort":true},"curve_type":"CURVE_25519","bucket_size":1048576} [2024-05-30 14:52:07.199] [info] [bucket_psi.cc:400] bucket size set to 1048576 [2024-05-30 14:52:07.595] [info] [bucket_psi.cc:293] begin progress callback loop thread, interval:5000 [2024-05-30 14:52:07.595] [info] [bucket_psi.cc:252] Begin sanity check for input file: /opt/1000w.csv, precheck_switch:false bucket psi config is protocol: RR22_LOWCOMM_PSI_2PC, broadcast_result: True, receiver_rank: 1, selected_fields: ['id'], precheck_input: False, output_sort: True, bucket_size: 1048576 id_0 = 10.218.184.238:1213 id_1 = 0.0.0.0:1213 progress callback ---- percentage: 0, total: 3, finished: 0, running: 0, description: Precheck, 0% progress callback ---- percentage: 0, total: 3, finished: 0, running: 0, description: Precheck, 0% progress callback ---- percentage: 0, total: 3, finished: 0, running: 0, description: Precheck, 0% progress callback ---- percentage: 0, total: 3, finished: 0, running: 0, description: Precheck, 0% progress callback ---- percentage: 0, total: 3, finished: 0, running: 0, description: Precheck, 0% progress callback ---- percentage: 0, total: 3, finished: 0, running: 0, description: Precheck, 0% 后面省略了若干行相同的callback 发送方 [2024-05-30 08:52:07.418] [info] [launch.cc:164] LEGACY PSI config: {"psi_type":"RR22_LOWCOMM_PSI_2PC","receiver_rank":1,"broadcast_result":true,"input_params":{"path":"/opt/1000w.csv","select_fields":["id"]},"output_params":{"path":"/opt/tmp/1000w.csv","need_sort":true},"curve_type":"CURVE_25519","bucket_size":1048576} [2024-05-30 08:52:07.418] [info] [bucket_psi.cc:400] bucket size set to 1048576

lq0404510 commented 1 month ago

模拟在10Mbps下,180ms延迟+45ms抖动的情况下,我这边未能复现您的这种情况,您那边在没有前面的抖动的约束下,此算法是可以正常任务的吗

Fissure45 commented 1 month ago

是的,前日测试中不设置抖动,10Mbps+180ms延迟可以正常任务;将抖动分别加到90ms、45ms,两次执行均失败。需要说明的是,由于数据量比较大(1亿vs1kw),在创建任务时1亿数据分割成了10份,执行时自动依序发起;90ms延迟下,第一个子任务直接失败;45ms延迟下,第一个子任务成功,第二个子任务失败,所以45ms下的失败可能是抖动干扰了调度产生的。如果您是执行单个任务,可以尝试90ms或更高的延迟、并加大数据量来尝试复现。

lq0404510 commented 1 month ago

hi,我这边通过180ms延迟+90ms抖动的情况下,数据量一亿vs一百万,也未能复现,我这使用的模拟抖动工具是tc,您那边是什么?

Fissure45 commented 1 month ago

模拟抖动工具也是tc;抖动平滑25%。我暂时没有更多信息要补充了,测试条件或结果有更新我会进一步反馈。

lq0404510 commented 1 month ago

好的,我们将持续关注此条issue