Open nsmith- opened 4 years ago
It's valid and it should be equivalent to the union of the two masks (from maskedwhen=True
). I had thought there was logic to say that OptionType(OptionType(X))
is an equal type to OptionType(X)
; I put in a few of these algebraic things, but that's a rabbit hole.
Yeah, it's true:
>>> import awkward, numpy
>>> array = awkward.MaskedArray([False, True, False, True, False, True],
... awkward.MaskedArray([False, False, False, True, True, True],
... [1.1, 2.2, 3.3, 4.4, 5.5, 6.6]))
>>> # checkerboard unions with half-and-half
>>> array
<MaskedArray [1.1 None 3.3 None None None] at 0x78d638ac5a90>
>>> # two levels deep
>>> array.type
ArrayType(6, OptionType(OptionType(dtype('float64'))))
>>> # is equivalent to one level deep
>>> array.type == awkward.type.ArrayType(6,
... awkward.type.OptionType(numpy.dtype("float64")))
True
I have to decide how much of that should survive into the new era. One good thing about reimplementation is that stuff that seemed like a good idea at the time but never actually got used goes away. Users won't be encouraged to make their own array structures anymore, so I guess I don't need to police it. I guess you've found that pad
needs to be smarter: if it's already looking at a MaskedArray
, it should add to its mask, rather than introduce another layer.
Also seeking opinions: I want to change the name from "MaskedArray" to something else because of how often we use the word "mask" to refer to slicing with a boolean array—a concept that's similar enough but different from what MaskedArray
does to cause confusion. "Masked" is what NumPy calls it, though maybe it's a bad thing to use a similar word for not-really-the-same classes (numpy.ma.MaskedArray
isn't interchangeable with awkward.MaskedArray
: the latter can contain jagged data, for instance). Besides, "masked" describes the how, not the what.
It seems to me that we have two other words for this, "nullable" and "optional." "Nullable" is an SQL term and "optional" or "option" is popular among modern programming languages. Haskell uses "maybe." I'm leaning toward
MaskedArray
→ BoolOptionalArray
BitMaskedArray
→ BitOptionalArray
IndexedMaskedArray
→ IndexedOptionalArray
(I'm not ignoring your other issue, #217; it just looks more difficult at the moment.)
I managed to end up with something like
which gives
<JaggedArray [[None] [] [1 None 3]] at 0x000111048690>
, and then proceeded to select some index inside the array withleaving me
<MaskedArray [None None 3] at 0x00013a865750>
. All good so far, but the type is very strange:ArrayType(3, OptionType(OptionType(dtype('int64'))))
I don't understand what nested OptionType means. I can collapse it at least:af[~af.boolmask()].content.content
returnsarray([3])
.