[Question]: Does SPU support bigdata or parallel computing

secretflow / spu

SPU (Secure Processing Unit) aims to be a provable, measurable secure computation device, which provides computation ability while keeping your private data protected.

https://www.secretflow.org.cn/docs/spu/en/

Apache License 2.0

221 stars 100 forks source link

[Question]: Does SPU support bigdata or parallel computing #761

Closed xyz-scorpio closed 3 weeks ago

xyz-scorpio commented 1 month ago

Feature Request Type

Performance

Have you searched existing issues?

Yes

Is your feature request related to a problem?

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe features you want to add to SPU

Just want to know if SPU supports bigdata / parallel computing, and if yes, in what way did it support such a think. TY.

Describe features you want to add to SPU

A clear and concise description of any alternative solutions or features you've considered.

tpppppub commented 1 month ago

There are various granularities of parallelism and vectorization within the SPU. It's not clear what your specific requirements are. How much data do you need to process and what granularity of parallelism do you need?

xyz-scorpio commented 1 month ago

There are various granularities of parallelism and vectorization within the SPU. It's not clear what your specific requirements are. How much data do you need to process and what granularity of parallelism do you need?

Let us take the SPU workflow as an example. My question is: i) What is the upper bound of data scale that SPU could handle? If I write some code to train a model with tensorflow, on very large datasets (e.g. ~TB) from different parties, how does SPU handle that? Will it do data parallel automaticly? ii) How many resources can a SPU VM take advantage of? Say, if I have 4 AWS EC2s, can one SPU VM take advantage of all 4 EC2s, or just one EC2 instance? And what is the parallel model of SPU VM?

anakinxc commented 1 month ago

There are various granularities of parallelism and vectorization within the SPU. It's not clear what your specific requirements are. How much data do you need to process and what granularity of parallelism do you need?

Let us take the SPU workflow as an example. My question is: i) What is the upper bound of data scale that SPU could handle? If I write some code to train a model with tensorflow, on very large datasets (e.g. ~TB) from different parties, how does SPU handle that? Will it do data parallel automaticly? ii) How many resources can a SPU VM take advantage of? Say, if I have 4 AWS EC2s, can one SPU VM take advantage of all 4 EC2s, or just one EC2 instance? And what is the parallel model of SPU VM?

We do not have a hard limit on data size. It is up to frameworks like Tensorflow to properly handles such large data through batching or some other techniques.
At this point, SPU can only take one machine. Right now, SPU only supports DLP and ILP on one machine.

anakinxc commented 3 weeks ago

no activity. close