secretflow / scql

SCQL (Secure Collaborative Query Language) is a system that allows multiple distrusting parties to run joint analysis without revealing their private data.
https://www.secretflow.org.cn/docs/scql/en/
Apache License 2.0
127 stars 48 forks source link

[Question]Does SCQL support query jobs on bigdata? #318

Open xyz-scorpio opened 4 months ago

xyz-scorpio commented 4 months ago

Issue Type

Have you searched for existing issues?

Yes

Link to Relevant Documentation

No response

Question Details

What is the upper bound of dataset scale that SCQL could handle? Say, if I want to do a query job on two datasets from Alice and Bob, both of ~TB size, can SCQL handle that? 

Also, does SCQL support distributed computing? If I have 4 AWS EC2s, can SCQL take advantage of all the resources, and how?
tongke6 commented 4 months ago

Hello @xyz-scorpio, SCQL is a system implementation of MPC SQL. Limited by MPC network communication, computing and memory overhead, I think its upper bound is to support data analysis on a scale of tens of millions within an acceptable time(e.g. < 6 hours).

For now, SCQL can only use one computing node on one party to process a query job, but different jobs can be scheduled to different computing node.

tongke6 commented 4 months ago

Expect for privacy set intersection (PSI) scenario, it can scale well.