noirello / pyorc

Python module for Apache ORC file format
Apache License 2.0
64 stars 20 forks source link

Reader can filter #35

Closed jruere closed 3 years ago

jruere commented 3 years ago

I'd like to read only the data which matches some criteria but I don't want to implement code handling Stripes and filtering on every project.

Could it be implemented in the Reader?

noirello commented 3 years ago

There's PR about predicate pushdown for the C++ Reader. I'd rather wait on implementing any filtering mechanism until it's not merged,

jruere commented 3 years ago

Absolutely! Glad I asked.

On Sat, 20 Feb 2021, 19:03 noirello, notifications@github.com wrote:

There's PR https://github.com/apache/orc/pull/476 about predicate pushdown for the C++ Reader. I'd rather wait on implementing any filtering mechanism until it's not merged,

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/noirello/pyorc/issues/35#issuecomment-782756091, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ5FRQ5CM4ZLQMGP2CKDILTAAWUDANCNFSM4XVG3P7A .

fehtemam commented 3 years ago

Seems like the PR is merged there. Any plans to have a row filter in the Reader?

noirello commented 3 years ago

I've started a branch about predicate filtering.

noirello commented 3 years ago

The new release includes a new feature: predicates. It might not be exactly what you're looking for, but it helps to reduce the result set using a filtering expression. There's a simple example in the docs.

jruere commented 3 years ago

It seems to be what I had in mind. :D