visualfabriq / bquery

A query and aggregation framework for Bcolz (W2013-01)
https://www.visualfabriq.com
BSD 3-Clause "New" or "Revised" License
56 stars 11 forks source link

Iterblocks #53

Open waylonflinn opened 9 years ago

waylonflinn commented 9 years ago

Simplify the cython code with bcolz.iterblocks, as in @FrancescElies PR

waylonflinn commented 9 years ago

Adapted from #24

CarstVaartjes commented 9 years ago

Really nice work! the only thing what I need to check now if we can do this inside a nogil prange (for multi threading)

waylonflinn commented 9 years ago

I've also got one more thing I want to add before this gets merged.

waylonflinn commented 9 years ago

This is done, if you'd like to merge it.

CarstVaartjes commented 9 years ago

I think that we cannot use bz.iterblocks because of this: http://docs.cython.org/src/userguide/external_C_code.html#nogil

@FrancescElies there is no cython implementation possible/planned right? as we cannot multi-thread python functions...

waylonflinn commented 9 years ago

I've been thinking a lot about this and I'm planning a more general solution to problems like this. It will involve a map reduce style processing layer.

I don't think processing multiple columns simultaneously is the way forward. I'm going to focus on processing rows in a single chunk in parallel.

— Sent from Mailbox

On Sun, Oct 11, 2015 at 2:18 PM, Carst Vaartjes notifications@github.com wrote:

I think that we cannot use bz.iterblocks because of this: http://docs.cython.org/src/userguide/external_C_code.html#nogil

@FrancescElies there is not possible cython implementation right? as we cannot multi-thread python functions...

Reply to this email directly or view it on GitHub: https://github.com/visualfabriq/bquery/pull/53#issuecomment-147237227

CarstVaartjes commented 9 years ago

Let's see, really open to suggestions but the problem is that we have a few algorithms that depend on the processing order; so parallelism in the blocks that would disturb the order can give issues :/

P.s. why do you think column-parallelism is a bad idea?

FrancescElies commented 9 years ago

...no cython implementation possible/planned? Not sure if I understand the question

CarstVaartjes commented 9 years ago

Sorry! to explain myself: the iterblocks is currently defined here as a normal python function: https://github.com/Blosc/bcolz/blob/master/bcolz/toplevel.py

unless it's a nogil defined cython function, we cannot call it multi-threaded, so that's why we cannot use it (even though it makes the code much nicer as you can see in this pull request)

FrancescElies commented 9 years ago

Not that I'm aware, but you could ask in the mailing list

On Sun, Oct 11, 2015 at 10:01 PM, Carst Vaartjes notifications@github.com wrote:

Sorry! to explain myself: the iterblocks is currently defined here as a normal python function: https://github.com/Blosc/bcolz/blob/master/bcolz/toplevel.py

unless it's a nogil defined cython function, we cannot call it multi-threaded, so that's why we cannot use it (even though it makes the code much nicer as you can see in this pull request)

— Reply to this email directly or view it on GitHub https://github.com/visualfabriq/bquery/pull/53#issuecomment-147239751.