kite-sdk / kite

Kite SDK
http://kitesdk.org/docs/current/
Apache License 2.0
394 stars 263 forks source link

KITE-1038: Allow to Optimize by applying constraints on the server if the underlying dataset supports it #395

Open prazanna opened 9 years ago

prazanna commented 9 years ago

HBaseServerSideFilterable SPI allows clients to set Server side filters to a HBase Scan using the EntityScannerBuilder interface

rdblue commented 9 years ago

Hey, I haven't had a long look at this, but I'm not sure about the newReader modification. I think there might be a better API that works with Constraints more. I haven't though it completely through, but I wanted to get you some feedback to start thinking about. I'm at OSCON for the rest of this week and will try to follow up. Thanks, @prazanna!

prazanna commented 9 years ago

Hey @rdblue, Thanks for taking a look. Yes. I will think about this a little more and may be propose a different API using Constraints.

rdblue commented 9 years ago

Thanks @prazanna! I'm really sorry to give you that feedback without a more though-out fix. Something involving constraints would be great, possibly a registration process that can apply a server-side filter when it sees a particular constraint. Then we could have a way to get the residual constraints (those not implemented server-side) that should be applied client-side. That would be great!

prazanna commented 9 years ago

Hey @rdblue, Sorry it took some time before I can circle around to this. So I have given a shot at optimizing constraints to apply at server-side when the underlying dataset reader supports it.

I am not very happy with the current design, any other alternative has some equal positives and negatives. So I thought I will bounce this off you and hear if you have anything to say on this.

Like:

Dont like:

Would like your thoughts on it when you have time. Thanks.

rdblue commented 9 years ago

I like the direction you're headed with this, it's looking great! From quickly looking at the filter package in HBase, I think we should be able to push most of the supported predicates to the server side, but it's fine to just support a few for optimization right now. And feel free to bring in mockito! I think we use it elsewhere, and I'm all for better tests.