scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
843 stars 89 forks source link

'numpy.histogram2d' implementation for awkward.highlevel.Array #1096

Closed uzzielperez closed 3 years ago

uzzielperez commented 3 years ago

Description of new feature

Hi,

I am trying to make a simple 2D histogram

fNtuple = uproot.open('nTuple_GGJets_Pt-15_13TeV-sherpa_evt71999.root')
tree = fNtuple['demo/EventTree']
branches = tree.arrays()

import matplotlib.pyplot as plt
import awkward as ak

plt.hist2d(ak.flatten(branches['phoSCEta']),ak.flatten(branches['phoSCPhi']),bins=150)
plt.subtitle('Photon location',fontsize=16)
plt.xlabel('$\eta$');
plt.ylabel('$\phi$');

But I could not find a solution from the documentation or the other issues

TypeError: no implementation found for 'numpy.histogram2d' on types that implement __array_function__: [<class 'awkward.highlevel.Array'>]

Does anyone know if there is already an existing implementation that could be used? In case you need the root file: https://drive.google.com/drive/folders/1MAMUybsbuSbJWhCP12dcTRrjUQNwVgGi

agoose77 commented 3 years ago

The mechanism for integrating Awkward Array ak.Arrays with the NumPy API (which is used by hist2d) is the __array_function__ method defined on ak.Array. This allows Awkward Array to propose Awkward-aware implementations of the high-level methods called on the array. This is described in detail in the NEP

Because Awkward doesn't implement an overload for numpy.histogram2d, calling it used to fail in old versions of Awkward.

In the later versions (>=1.5.0), we now have a fallback to try and call the original NumPy function after converting any array arguments into nice rectilinear NumPy arrays: https://github.com/scikit-hep/awkward-1.0/blob/1.5.0/src/awkward/_connect/_numpy.py#L43-L55

So, the "fix" here is to update to the newest version of Awkward, or to explicitly convert your arguments to NumPy arrays before calling hist2d (if you are unable to upgrade).

uzzielperez commented 3 years ago

Great! Thanks a lot for the info and the fix.

jpivarski commented 3 years ago

That's right—it should be fixed now (@agoose77 pointed to the update that did this). I did a quick test:

>>> import awkward as ak
>>> import matplotlib.pyplot as plt
>>> plt.hist2d(ak.Array([1, 2, 3]), ak.Array([4, 5, 6]))
(
    array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]),
    array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. ]),
    array([4. , 4.2, 4.4, 4.6, 4.8, 5. , 5.2, 5.4, 5.6, 5.8, 6. ]),
    <matplotlib.collections.QuadMesh object at 0x7f85d1155280>
)

For reference, the arrays do have to be flat, but they don't have to be explicitly converted from "Awkward Brand" arrays into "NumPy Brand."