Closed martindurant closed 2 months ago
That _merge
function is going to be Numba'd, right?
There is an ak.numba.GrowableBuffer that you can use instead of over-allocating and trimming. (Does trim really release the memory for other objects, or is it just a slice that holds a reference to the original allocation as its base
?) This was part of a LayoutBuilder implementation in Numba (scikit-hep/awkward#2408) that never ended up being faster than ArrayBuilder in Numba, and we don't know why.
In principle, a numba.typed.List of numbers is the same kind of thing as a GrowableBuffer, and presumably this is better optimized than our GrowableBuffer. It could be because our GrowableBuffer passes around two array objects, which are separately reference-counted, while Numba's typed list would have only one reference count.
In both the Numba-LayoutBuilder project and yours, we're seeing the dynamic ArrayBuilder being faster than it's supposed to be. It's even accessed through external pointers (no possibility for LLVM to optimize), so I really don't know what's going on there. It shouldn't be too hard to beat, if there aren't other bottlenecks elsewhere, hiding the performance of this part.
Does trim really release the memory for other objects
Yes it does: it must own the memory and the memory must be contiguous. You would pass a flag asserting that no other reference points to the same array (although python references would be fine and updated in place).
we're seeing the dynamic ArrayBuilder being faster than it's supposed to be
do you think there's any point in this PR, then? At scale, the over-allocation would be costly, so I would need one of the growable implementations anyway.
I don't think there's anything wrong with the PR. In principle, one should be able to do better than ArrayBuilder. I've always taken it as some issue hiding in the implementation of ak.numba.LayoutBuilder
, but your project is independent of that. (And I'm assuming that you'll use some growable buffer, eventually.)
As an alternative to the alternative, can the merge be implemented by creating "tags" and "index" arrays of integers using Numba and then pass them into ak.contents.UnionArray.simplified to mix them appropriately? The return value of this function is not necessarily a union array; it tries to homogenize types.
As an alternative to the alternative
Maybe next time! I think this is nice for now.
@jpivarski , the code in akimbo.io includes, for now, both the previous
builder
version and this new version of the JOIN algorithm, for you to look at. While thebuilder
is very dynamic and ought to be slower, it also features a growable buffer for the index data, whereas the new version is allocated as big as it can possibly be (it could beresize
d when finished). Is it possible to use a growable buffer here without the builder?The merge function takes a trivial amount of time compared to the other things going on in the join function, so I am not sure if this is actually any faster at all. Am I missing something?