[FEATURE REQUEST] Implement numpy.ma

water5 commented 2 years ago

https://numpy.org/doc/stable/reference/maskedarray.generic.html

numpy.ma have some functions, can we implement several? numpy.ma.array, numpy.ma.ones, numpy.ma.empty, numpy.ma.arange, numpy.ma.masked_where

import numpy.ma as ma
a = ma.arange(25).reshape(5, 5)
a

masked_array( data=[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]], mask=False, fill_value=999999)

a.mask = a > 7
a

masked_array( data=[[0, 1, 2, 3, 4], [5, 6, 7, --, --], [--, --, --, --, --], [--, --, --, --, --], [--, --, --, --, --]], mask=[[False, False, False, False, False], [False, False, False, True, True], [ True, True, True, True, True], [ True, True, True, True, True], [ True, True, True, True, True]], fill_value=999999)

a *= 10
a

masked_array( data=[[0, 10, 20, 30, 40], [50, 60, 70, --, --], [--, --, --, --, --], [--, --, --, --, --], [--, --, --, --, --]], mask=[[False, False, False, False, False], [False, False, False, True, True], [ True, True, True, True, True], [ True, True, True, True, True], [ True, True, True, True, True]], fill_value=999999)

a.mask = ma.nomask
a

masked_array( data=[[0, 10, 20, 30, 40], [50, 60, 70, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]], mask=[[False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False]], fill_value=999999)

ma.masked_where(a > 8, a)

masked_array( data=[[0, --, --, --, --], [--, --, --, 8, --], [--, --, --, --, --], [5, 6, 7, 8, --], [--, --, --, --, --]], mask=[[False, True, True, True, True], [ True, True, True, False, True], [ True, True, True, True, True], [False, False, False, False, True], [ True, True, True, True, True]], fill_value=999999)

v923z commented 2 years ago

@water5 But can't you achieve the same thing via Boolean indexing? You just want to get rid of missing detector data, so that the missing data don't mess up any subsequent calculations, right?

a = np.array([1, 2, 3, -1, 5])
sum(a[a > 0])

I don't quite see, where the masked arrays would have an advantage (beyond convenience), when compared to Boolean indexing.

In principle, I am not against the idea, but since this is a quite significant undertaking, I cannot assign high priority to it. Also, there are quite a few functions in numpy that we haven't yet implemented, but if we implemented everything, then we would have exactly that, numpy. I would like to re-iterate one of the first sentences of the user manual:

ulab implements a small subset of numpy and scipy. The functions were chosen such that they might be useful in the context of a microcontroller.

We never wanted to produce a one-to-one copy of numpy, and I think, it wouldn't make too much sense. If you want numpy, then use numpy. Given the manpower that we have here, we have to be very selective as to what we want to implement, and how.

water5 commented 2 years ago

numpy.isin is a choice for instead numpy.ma, https://numpy.org/doc/stable/reference/generated/numpy.isin.html Which easier to implement between numpy.isin and numpy.ma.*?

a = np.arange(9).reshape((3, 3))
a

array([[0, 1, 2], [3, 4, 5], [6, 7, 8]], dtype=int16)
test_element = [1, 3, 6, 8]
test_element
[1, 3, 6, 8]
mask_ = np.isin(a, test_element)
mask_
array([[False, True, False], [ True, False, False], [ True, False, True]])
a[mask_]
array([1, 3, 6, 8])
a[mask_] *= 10
a
array([[ 0, 10, 2], [30, 4, 5], [60, 7, 80]])

But numpy.isin required likes below operation, ulab.numpy not implement currently:
a = np.arange(5)
a
array([0, 1, 2, 3, 4], dtype=int16)
mask_ = [1, 3]
a[mask_]
Nothing to output.

I see https://github.com/v923z/micropython-ulab/issues/487, https://github.com/v923z/micropython-ulab/pull/488, is it that implement above operation after done?

v923z commented 2 years ago

@water5 isin is definitely easier to implement. However, when you iterate over the rows of an array, you actually get a view, i.e., if you manipulate the row, you are, in effect, manipulating the original array. Would

from ulab import numpy as np

a = np.array(range(25)).reshape((5, 5))
test_elements = [3, 6, 7, 8]

for row in a:
    for i in range(a.shape[1]):
        value = row[i]
        if value in test_elements:
            row[i] = 10 * value

print(a)

be unacceptably slow?

v923z / micropython-ulab

[FEATURE REQUEST] Implement numpy.ma #490

array([[ 0, 10, 2], [30, 4, 5], [60, 7, 80]])