scikit-hep / awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.
BSD 3-Clause "New" or "Revised" License
215 stars 39 forks source link

IndexError when using MaskedArray and flatten #208

Closed jrueb closed 5 years ago

jrueb commented 5 years ago

The following two lines raise an IndexError

a = awkward.MaskedArray([True, False], [1, 1])
print((a + 1).flatten())

This is however valid code, is it not? It seems there is a problem with the implementation of flatten for IndexedMaskedArray.

jpivarski commented 5 years ago

It does look like a bug—thanks for pointing this out. I might not be able to touch it in a couple of days, though. Note that this operation should have no effect—there's no jagged array to flatten, so this would just passed through.

jpivarski commented 5 years ago

(I got a chance to look at it anyway.) It looks like flattening a MaskedArray is defined in such a way as to eliminate all the None values. I don't think that's what the definition ought to be. It doesn't make sense for an operation to do such different things to lists and non-lists, particularly since the axis parameter only specifies which list depth it operates on; you can't control whether there are any masks at any level.

I'm going to fix it so that flatten just passes through MaskedArrays. If you want to remove None values, do a[a.boolmask(maskedwhen=False)] to use a MaskedArray's mask as a mask. (I also need to invent a new word to reduce redundancy in the terminology.)

jpivarski commented 5 years ago

After a bit more investigation, I see that this wasn't a bad definition—it was a bad implementation. In PR #209, I've fixed it. It should now work like this:

    def test_issue_208(self):
        a = awkward.MaskedArray([True, False, False, True, False, True], awkward.fromiter([[1, 2, 3], [4, 5], [6], [7, 8], [10, 11, 12], [999]]))
        assert a.flatten().tolist() == [None, 4, 5, 6, None, 10, 11, 12, None]
        assert (a + 100).flatten().tolist() == [None, 104, 105, 106, None, 110, 111, 112, None]
        a = awkward.MaskedArray([True, False, False, True, False, True], [1.1, 2.2, 3.3, 4.4, 5.5, 6.6])
        assert a.flatten().tolist() == [None, 2.2, 3.3, None, 5.5, None]
        assert (a + 100).flatten().tolist() == [None, 102.2, 103.3, None, 105.5, None]

It will probably be a day or so before I can merge this PR, but you can check it out now.

jrueb commented 5 years ago

Looks good! Thanks