vertica / ddR

Standard API for Distributed Data Structures in R
GNU General Public License v2.0
118 stars 17 forks source link

Possible to return DObject when subsetting? #20

Open clarkfitzg opened 8 years ago

clarkfitzg commented 8 years ago

Probably the most common operation in R is subsetting, ie. [ and $. While trying to use ddR and looking at ops.R I notice that the subsetting operators all collect.

How would one return a distributed object from subsetting? I.e. remove 10% of the rows of a dframe that contain NA. The resulting dframe will still be large, so it's best to keep it distributed.

One idea is to have DObjects closed under subsetting and leave it to the user call collect() explicitly.

lawremi commented 8 years ago

Yes, deferred filtering absolutely makes sense.