Open waylonflinn opened 9 years ago
Adapted from #24
Really nice work! the only thing what I need to check now if we can do this inside a nogil prange (for multi threading)
I've also got one more thing I want to add before this gets merged.
This is done, if you'd like to merge it.
I think that we cannot use bz.iterblocks because of this: http://docs.cython.org/src/userguide/external_C_code.html#nogil
@FrancescElies there is no cython implementation possible/planned right? as we cannot multi-thread python functions...
I've been thinking a lot about this and I'm planning a more general solution to problems like this. It will involve a map reduce style processing layer.
I don't think processing multiple columns simultaneously is the way forward. I'm going to focus on processing rows in a single chunk in parallel.
— Sent from Mailbox
On Sun, Oct 11, 2015 at 2:18 PM, Carst Vaartjes notifications@github.com wrote:
I think that we cannot use bz.iterblocks because of this: http://docs.cython.org/src/userguide/external_C_code.html#nogil
@FrancescElies there is not possible cython implementation right? as we cannot multi-thread python functions...
Reply to this email directly or view it on GitHub: https://github.com/visualfabriq/bquery/pull/53#issuecomment-147237227
Let's see, really open to suggestions but the problem is that we have a few algorithms that depend on the processing order; so parallelism in the blocks that would disturb the order can give issues :/
P.s. why do you think column-parallelism is a bad idea?
...no cython implementation possible/planned? Not sure if I understand the question
Sorry! to explain myself: the iterblocks is currently defined here as a normal python function: https://github.com/Blosc/bcolz/blob/master/bcolz/toplevel.py
unless it's a nogil defined cython function, we cannot call it multi-threaded, so that's why we cannot use it (even though it makes the code much nicer as you can see in this pull request)
Not that I'm aware, but you could ask in the mailing list
On Sun, Oct 11, 2015 at 10:01 PM, Carst Vaartjes notifications@github.com wrote:
Sorry! to explain myself: the iterblocks is currently defined here as a normal python function: https://github.com/Blosc/bcolz/blob/master/bcolz/toplevel.py
unless it's a nogil defined cython function, we cannot call it multi-threaded, so that's why we cannot use it (even though it makes the code much nicer as you can see in this pull request)
— Reply to this email directly or view it on GitHub https://github.com/visualfabriq/bquery/pull/53#issuecomment-147239751.
Simplify the cython code with bcolz.iterblocks, as in @FrancescElies PR