v923z / micropython-ulab

a numpy-like fast vector module for micropython, circuitpython, and their derivatives
https://micropython-ulab.readthedocs.io/en/latest
MIT License
403 stars 111 forks source link

[FEATURE REQUEST] add indexing via array #607

Closed Derfies closed 5 months ago

Derfies commented 1 year ago

I would like the ability to use one ndarray to index into another one, aka “Advanced indexing” (https://numpy.org/doc/stable/user/basics.indexing.html)

eg:

x = np.arange(10, 1, -1) -> x array([10, 9, 8, 7, 6, 5, 4, 3, 2]) -> x[np.array([3, 3, 1, 8])] array([7, 7, 9, 2])

v923z commented 1 year ago

Would one-dimensional indexing suffice? You could still index higher-dimensional arrays, I am asking about the dimension of the indexing array.

Derfies commented 1 year ago

My specific use case is essentially as above - it's a simple 1 -> 1 lookup table that I am attempting to achieve. So short answer - yes absolutely!

hamza-712 commented 1 year ago

Hi, can you please implement index 1D array

s-ol commented 9 months ago

I would also find it really useful to be able to use indexing, in my case to reorder elements. To simplify, this could/should be limited to uint* valued arrays maybe. 1D would be a good start I guess.

s-ol commented 9 months ago

Here's a diff that implements indexing by u/int* valued arrays. The indexing array can be multi-dimensional but the indexed-array needs to be linear. Note that there is also no range-checking whatsoever.

Instead of "wasting" cycles by checking for errors in the tight loop, maybe the indices could be taken mod the length of the indexed array, with negative indices being counted from the back. That way it at least constitutes a "feature".

As a library user, I'd use an "unsafe" version if I could for more performance, but I'm not sure if that is aligned with the spirit of this library.

index.patch

v923z commented 9 months ago

Thanks for taking a stab at it!

Here's a diff that implements indexing by u/int* valued arrays. The indexing array can be multi-dimensional but the indexed-array needs to be linear. Note that there is also no range-checking whatsoever.

I think there should be some sort of range checking, and it should happen at the very beginning. There is no point in reserving space for an array, if it turns out that the array cannot be filled.

Instead of "wasting" cycles by checking for errors in the tight loop, maybe the indices could be taken mod the length of the indexed array, with negative indices being counted from the back. That way it at least constitutes a "feature".

numpy is quite clear on what an index means, and we shouldn't break with that.

As a library user, I'd use an "unsafe" version if I could for more performance, but I'm not sure if that is aligned with the spirit of this library.

I would be OK with partial features, e.g., implementing something for 1D arrays only, and bailing out, if you pass a 2D array, but things that lead to unexpected results should be avoided.

As you have realised, indexing by arrays is significantly harder than slicing, because it is relatively easy to inspect slices, while making sure that the values of an indexing array are within limits requires the inspection of the full array.

1D indexing with reasonable error checking would not be hard, so if you are OK with that, I could relatively easily add that. If your main concern is the time that could be lost with checking the indexing array, what could be an alternative is to implement this feature as a function, where you could pass a keyword argument, so that you could skip the error checking step. I don't think that there is such a function in numpy, so we wouldn't have to care about compatibility.

s-ol commented 9 months ago

numpy is quite clear on what an index means, and we shouldn't break with that.

I haven't used numpy extensively, so I had to look it up. For reference, here's how 'advanced indexing' works there:

That seems to gel quite well with your proposal of checking bounds up-front, a first pass could do bounds-checking, conversion of negative indices, and maybe even account for non-dense source array strides.

1D indexing with reasonable error checking would not be hard, so if you are OK with that, I could relatively easily add that.

I think 1d source arrays would be a great addition, and anything else can be accomplished anyway using .flatten() on the input. It seems to me from my attempt that supporting multiple dimensions for the indexing array (and therefore the output) comes more or less "for free" so that could be included IMO.

what could be an alternative is to implement this feature as a function, where you could pass a keyword argument, so that you could skip the error checking step

I think this is mostly a decision of project direction and scope. If the goal of ulab is to be numpy compatible and enable the reuse of near-verbatim numpy code on embedded platforms, features like this are not necessary.

I myself ended up using ulab from a different perspective though. I have a need to do some calculations involving "mid-size" buffers (50-150 elements) at a comparatively high rate in CircuitPython, on a platform that has no hardware floating point support. I also want this part of my codebase to be easily extensible by users, so if possible I would like to avoid moving my code into a custom C library.

It could definitely be argued that my needs would be met better by a dedicated library that focuses on high-performance fixed-point array primitives, but there is significant overlap with the features that ulab provides today. Features like safety opt-out or the out= arguments would be very welcome for this type of application.

v923z commented 9 months ago

That seems to gel quite well with your proposal of checking bounds up-front, a first pass could do bounds-checking, conversion of negative indices, and maybe even account for non-dense source array strides.

And that brings up the first question: since negative indices have to be converted, you can't do this without reserving space for the indices, or having to do the conversion twice. Which one?

1D indexing with reasonable error checking would not be hard, so if you are OK with that, I could relatively easily add that.

I think 1d source arrays would be a great addition, and anything else can be accomplished anyway using .flatten() on the input. It seems to me from my attempt that supporting multiple dimensions for the indexing array (and therefore the output) comes more or less "for free" so that could be included IMO.

Can you create a PR with this, so that we can iron out the details? I'm a bit pessimistic about the higher-dimensional cases, but I might have overlooked something.

what could be an alternative is to implement this feature as a function, where you could pass a keyword argument, so that you could skip the error checking step

I think this is mostly a decision of project direction and scope. If the goal of ulab is to be numpy compatible and enable the reuse of near-verbatim numpy code on embedded platforms, features like this are not necessary.

I'm all for numpy-compatibility, wherever possible, but there are multiple issues that one has to deal with on a microcontroller. Pretty much all are related to the scarcity of resources, and there were a couple of instances, where we had to break with compatibility, simply because there was no reasonable way of implementing a features within the bounds of an MCU.

I myself ended up using ulab from a different perspective though. I have a need to do some calculations involving "mid-size" buffers (50-150 elements) at a comparatively high rate in CircuitPython, on a platform that has no hardware floating point support. I also want this part of my codebase to be easily extensible by users, so if possible I would like to avoid moving my code into a custom C library.

ulab was meant to be extendable (https://micropython-ulab.readthedocs.io/en/latest/ulab-programming.html), though, I've got to admit that, to my knowledge, this idea never got traction.

It could definitely be argued that my needs would be met better by a dedicated library that focuses on high-performance fixed-point array primitives, but there is significant overlap with the features that ulab provides today.

If that's indeed the case, then why don't you consider implementing your features on top of ulab, simply as a user module?

Derfies commented 6 months ago

Was there any further developments on this? I'm in a situation where leveraging fancy indexing or the equivalent of np.take would be seriously advantageous. Or is there a way to simulate this behaviour with a combination of other commands?

v923z commented 6 months ago

I would actually be in favour of an implementation of np.take, because it would be easier to configure in the pre-processing step: if you don't need it, you just disable the function altogether. It would also be easier to start with a not-so-complete implementation, and keep track of the progress.

Based on your comment in https://github.com/v923z/micropython-ulab/issues/607#issuecomment-1532757073, np.take would be enough in most cases. I will try to come up with an implementation over the weekend. It would be relatively straightforward, and I don't foresee any major hiccups.

I'd like to ask for at least a minimal test suite for at least 2D arrays.

v923z commented 6 months ago

@Derfies Would it be OK, if we closed the current issue, when https://github.com/v923z/micropython-ulab/issues/661 is completed?

Derfies commented 5 months ago

@v923z those tickets look equivalent, so yes.

Do you still need tests written for this feature?

v923z commented 5 months ago

@v923z those tickets look equivalent, so yes.

OK

Do you still need tests written for this feature?

Yes, that would be useful. I'll try to complete the implementation in the near future.