Closed peastman closed 5 years ago
Another option would be to change nIdxToLinear()
and linearToNIdx()
into methods of NDArray
. Then they could access internal fields directly.
Yeah, what's there is definitely a first cut and full of potential optimizations. My long-term plan is to have a benchmark suite we run against all the backends on all 3 platforms and have it run in CI so we can detect any regressions that affect a single platform. But thats currently stalled by koma-tests not being trivial to port, which is blocking #56, which is in turn blocked by #77, which is blocked by https://youtrack.jetbrains.net/issue/KT-27849.
In the meantime though, doing some initial passes that take out the obviously sub-optimal code is good. I'll take a look at #94 soon.
I've been looking into improving the performance of accessing arrays. Suppose that
array
is aNDArray<Double>
and you look up an element by index, sayarray[0,1]
. It turns out there's actually a huge overhead involved. Let me walk through what happens.The
get()
function is implemented aswhich is simple enough. It just calls through to
getDouble()
, which is implemented asOk, let's look up
safeNIdxToLinear()
:So first it calls
checkIndices()
:That mostly looks reasonable, but there's a couple of hidden costs. Here's how
shape()
is implemented:Notice that every call to
shape()
constructs a new list. That's important, because we'll be making multiple calls to it. Not only that, but it converts the primitive integers inshape
into boxedInt
objects, which then have to be unboxed when we access them. This is a lot of unnecessary overhead.But we've hardly begun. Now let's look at
nIdxToLinear()
:That mostly looks efficient, except that
widthOfDims
is anotherList<Int>
, so there's a cost to unboxing each element. But the call towidthOfDims()
is where things really get complicated.So it first calls
shape()
, which as we saw constructs a newList
of boxedInts
. Then we immediately calltoList()
, which creates anotherList
, and adds and removes some elements. Plus it callsaccumulateRight()
which creates yet another list, and does some more manipulations of it:Ok, after all that we finally have the linear index of the element to look up. That gets passed to
getDouble()
:That's mostly inexpensive (
storage
is aDoubleArray
, so no boxing is required). But we do pass it throughcheckLinearIndex()
:Remember, we already went through
checkIndices()
, which made sure the indices were legal. So this is completely redundant.Ok, what can we do to improve this? First, there's no need to recompute
widthOfDims()
every time. The content of that list never changes. We could just compute it in the constructor and return it directly. Same withshape()
. There's no need to create a newList<Int>
every time it's called.Another possibility we could consider is returning the shape and widths as
IntArray
rather thanList<Int>
. That gets a bit more dangerous, since arrays are mutable. It would have to beinternal
only, and even then might introduce more risk of coding errors than we want. But it would improve performance by eliminating the need for unboxing.We also should try to eliminate the duplicate range check. If we've already verified the n-dimensional indices are valid we don't need to also check the corresponding linear index.