scikit-hep / awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.
BSD 3-Clause "New" or "Revised" License
215 stars 39 forks source link

counts on array.from_offsets #230

Closed andrzejnovak closed 4 years ago

andrzejnovak commented 4 years ago

I have a doubly jagged array (build from_offsets if it matters like this: Actual array, shape (35580,) [[[<Row 0> <Row 2>] [<Row 1> <Row 3>]] [[<Row 4> <Row 6>] [<Row 5> <Row 7>]] [[<Row 8> <Row 11>] [<Row 9> <Row 12>] [<Row 10> <Row 13>]] ... [[<Row 165497> <Row 165500>] [<Row 165498> <Row 165499>]] [[<Row 165501> <Row 165504>] [<Row 165502> <Row 165503>] [<Row 165505> <Row 165506>]] [[<Row 165507> <Row 165510>] [<Row 165508> <Row 165509>] [<Row 165511> <Row 165512>]]] I would like to get counts along the inner dimension, but .counts returns only the top one. array.counts, len 35580 [2 2 3 ... 2 3 3]

Is there a way to get the inner dimension already? If not an API of sth like counts(depth=n) would be cool. Maybe also a special keyword like counts(depth='last')

jpivarski commented 4 years ago

You want count() (singular, with parentheses). This was a poor design choice: count() is a reducer that operates at axis=-1 (innermost) and counts is a static array of essentially the same thing at axis=0. They're getting combined into a single function named ak.count(array) in version 1.0.

>>> import awkward
>>> awkward.fromiter([[[1, 2, 3], [], [4, 5]], [], [[6]]])
<JaggedArray [[[1 2 3] [] [4 5]] [] [[6]]] at 0x7f7f4a2c7d10>
>>> array = awkward.fromiter([[[1, 2, 3], [], [4, 5]], [], [[6]]])
>>> array.count()
<JaggedArray [[3 0 2] [] [1]] at 0x7f7f2baa4cd0>
andrzejnovak commented 4 years ago

It doesn't like that : ValueError: some Table columns are jagged and others are not

jpivarski commented 4 years ago

Project to columns first:

array["column_name"].count()

If you need to wrap it up into a Table again, you can use its constructor:

awkward.Table({"column_name": array["column_name"].count(), ...})

which... perhaps... ought to have been automatic. Awkward 0.x operations are not as general (applying equally to all types) as they should be.

andrzejnovak commented 4 years ago

Ah I see, I can just use any column to do this.

jets[jets.subjets.pt.count() >= 2].subjets[:, :, 0]

Not great for readability, but it seems to do the right thing. Thanks.

jpivarski commented 4 years ago

Great! I'm going to close this, but if you still have issues, I can reopen it.