secretflow / psi

The repo of Private Set Intersection(PSI) and Private Information Retrieval(PIR) from SecretFlow.
https://www.secretflow.org.cn/docs/psi
Apache License 2.0
22 stars 16 forks source link

关于psi_df函数的输入数据问题 #143

Closed XiaoLazi closed 3 weeks ago

XiaoLazi commented 3 weeks ago

Describe the bug

两台机器用 psi_df进行PSI时,疑问:怎么将对方PYUobject数据给传送过来;或者不传送,数据仍在对方那里,我这边程序中怎么找到该数据。(psi_csv是通过配置对方数据的路径,不明白psi_df该怎么传送或设置?)。

Steps To Reproduce

这是在本地执行的:


sf.shutdown()
sf.init(['alice','bob','carol'],address='local')

conn = pymysql.connect(host='10.3.0.12',port=3306,user='root',passwd='root',database='prac',charset='utf8',use_unicode=True)
sql_1 = 'select * from alice'
sql_2 = 'select * from bob'
sql_3 = 'select * from carol'

da = pd.read_sql(sql_1,conn).sample(frac=0.9)
db = pd.read_sql(sql_2,conn).sample(frac=0.8)
dc = pd.read_sql(sql_3,conn).sample(frac=0.7)

a_obj_ref = ray.put(da)
b_obj_ref = ray.put(db)
c_obj_ref = ray.put(dc)

alice,bob,carol = sf.PYUObject(sf.PYU('alice'),a_obj_ref),sf.PYUObject(sf.PYU('bob'),b_obj_ref),sf.PYUObject(sf.PYU('carol'),c_obj_ref)

spu_3pc = sf.SPU(sf.utils.testing.cluster_def(['alice','bob','carol']))

psi_3pc = spu_3pc.psi_df(['uid','month'],[alice,bob,carol],'alice',protocol='ECDH_PSI_3PC')
for d in psi_3pc:
    print(d)
    print(d.device)
    print(d.data)
    print(ray.get(d.data))
    print(type(ray.get(d.data)))
#    print(type(d.device))
#    sf.PYU(d.device).dump(obj=d,path='./output.csv')
print(type(psi_3pc[0]))
print(type(psi_3pc[0].device))

Expected behavior

本地执行的话可以直接在程序中把输入PYUobject创建出来, alice,bob,carol = sf.PYUObject(sf.PYU('alice'),a_obj_ref),sf.PYUObject(sf.PYU('bob'),b_obj_ref),sf.PYUObject(sf.PYU('carol'),c_obj_ref) 疑问:如果两台机器来执行,我怎么把这个PYUobject传过来?

Version

Secretflow 1.6.1b0

Operating system

centos 7 x64

Hardware Resources

8C80G

6fj commented 3 weeks ago

所以你的问题是啥?你是对psi_df的用法有疑问还是有新需求吗?

XiaoLazi commented 3 weeks ago

@6fj 是两台机器执行的时候,alice机器里的程序需要用到bob的PYUobject, 我不太明白alice怎么在程序中获取到bob的PYUobject?(我理解的是 bob的数据在bob的机器里,也就是bob在他的程序里创建PYUobject)

6fj commented 3 weeks ago

PYU object 本质是一个 Ray 的 remote object, 本身不包含实际的数据。PSI 过程中alice是不会接触bob的实际数据的。

XiaoLazi commented 3 weeks ago

@6fj alice_ref = ray.put(alice_data); alice = sf.PYUObject(sf.PYU('alice'),alice_ref) spu.psi_df(['id'],[alice,xx],'alice','alice',protocol='ECDH_PSI_3PC') 我的意思是 alice程序进行psi时,psi_df函数的参数,也就是xx的位置需要填写bob的PYUobject,请问bob的PYUobject我怎么获取到?谢谢

6fj commented 3 weeks ago

这里bob的PYUObject只是一个 Ray 的 remote object,指向bob的实际数据,你这里说的获取是指?

XiaoLazi commented 3 weeks ago

@6fj 感谢回答 ,我对底层用的ray不太了解,所以不知道xx这里该怎么填写,, 接下来可能要阅读一下关于ray的相关文档,再来看这个问题,非常感谢!!