scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
821 stars 85 forks source link

PR 3119 introduced overtouching #3185

Closed ikrommyd closed 1 month ago

ikrommyd commented 1 month ago

Version of Awkward Array

>=2.6.5

Description and code to reproduce

Reproducer:

import awkward as ak
import numpy as np

from coffea.nanoevents import NanoEventsFactory
import dask_awkward as dak
import hist.dask as hda

def test_necessary_columns():
    events, report = NanoEventsFactory.from_root(
        {
            "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"
        },
        metadata={"dataset": "Test"},
        uproot_options={"allow_read_errors_with_report": True},
    ).events()

    events["Electron", "pdgId"] = -11 * events.Electron.charge
    events["Muon", "pdgId"] = -13 * events.Muon.charge
    events["leptons"] = ak.concatenate(
        [events.Electron, events.Muon],
        axis=1,
    )
    events = events[ak.num(events.leptons) >= 3]
    pair = ak.argcombinations(events.leptons, 2, fields=["l1", "l2"])
    pair = pair[(events.leptons[pair.l1].pdgId == -events.leptons[pair.l2].pdgId)]

    pair = pair[
        ak.singletons(
            ak.argmin(
                abs((events.leptons[pair.l1] + events.leptons[pair.l2]).mass - 91.2),
                axis=1,
            )
        )
    ]

    events = events[ak.num(pair) > 0]
    pair = pair[ak.num(pair) > 0][:, 0]

    l3 = ak.local_index(events.leptons)
    l3 = l3[(l3 != pair.l1) & (l3 != pair.l2)]
    l3 = l3[ak.argmax(events.leptons[l3].pt, axis=1, keepdims=True)]
    l3 = events.leptons[l3][:, 0]

    mt = np.sqrt(2 * l3.pt * events.MET.pt * (1 - np.cos(events.MET.delta_phi(l3))))
    q8_hist = (
        hda.Hist.new.Reg(100, 0, 200, name="mt", label="$\ell$-MET transverse mass [GeV]")
        .Double()
        .fill(mt)
    )
    print(dak.necessary_columns(q8_hist))
    if len(str(dak.necessary_columns(q8_hist))) > 1000:
        return 1
    else:
        return 0

exit(test_necessary_columns())

The bug is introduced by https://github.com/scikit-hep/awkward/pull/3119

My git bisect report:

(work) ➜  awkward git:(v2.6.5) ✗ git bisect start
M   awkward-cpp/rapidjson
Previous HEAD position was 45c708d2 chore: the next release will be 2.6.5
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
status: waiting for both good and bad commits
(work) ➜  awkward git:(main) ✗ git bisect bad
status: waiting for good commit(s), bad commit known
(work) ➜  awkward git:(main) ✗ git bisect good 727ff9ca4759aed52e0b4f21d3daa66724582c61
Bisecting: 11 revisions left to test after this (roughly 4 steps)
[02858f895f89f7e6292dc3a8d7e079d596885286] feat: Adding awkward.semipublic submodule (#3152)
(work) ➜  awkward git:(02858f89) ✗ git bisect run python bug.py
running 'python' 'bug.py'
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/methods/candidate.py:11: FutureWarning: In version 2024.7.0 (target date: 2024-06-30 11:59:59-05:00), this will be an error.
To raise these warnings as errors (and get stack traces to find out where they're called), run
    import warnings
    warnings.filterwarnings("error", module="coffea.*")
after the first `import coffea` or use `@pytest.mark.filterwarnings("error:::coffea.*")` in pytest.
Issue: coffea.nanoevents.methods.vector will be removed and replaced with scikit-hep vector. Nanoevents schemas internal to coffea will be migrated. Otherwise please consider using that package!.
  from coffea.nanoevents.methods import vector
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/schemas/nanoaod.py:243: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(
{'from-uproot-781bf991cccf86d40c7bf9a7cf6777ff': frozenset({'Muon_segmentComp', 'Electron_dxyErr', 'Muon_miniPFRelIso_chg', 'Electron_mvaFall17V2Iso_WP80', 'Muon_miniIsoId', 'Electron_cutBased', 'Muon_inTimeMuon', 'Electron_cutBased_Fall17_V1', 'Electron_tightCharge', 'Electron_jetRelIso', 'nGenPart', 'Muon_pfRelIso03_chg', 'Muon_tightId', 'Electron_cutBased_HEEP', 'Electron_mvaFall17V2noIso_WPL', 'Electron_vidNestedWPBitmap', 'Electron_mvaFall17V1Iso_WP90', 'Electron_mvaFall17V2Iso_WP90', 'Electron_dr03TkSumPt', 'Muon_nTrackerLayers', 'Electron_hoe', 'Electron_pfRelIso03_chg', 'Electron_miniPFRelIso_chg', 'Muon_isTracker', 'Muon_nStations', 'Muon_fsrPhotonIdx', 'Muon_dz', 'Muon_sip3d', 'Muon_genPartIdx', 'Electron_dz', 'Electron_dr03HcalDepth1TowerSumEt', 'Muon_tunepRelPt', 'Electron_dr03TkSumPtHEEP', 'nMuon', 'Electron_mvaFall17V2noIso_WP80', 'Muon_dzErr', 'Electron_dzErr', 'Muon_looseId', 'Muon_mvaLowPt', 'Muon_pfIsoId', 'Electron_energyErr', 'Electron_mvaFall17V1Iso', 'Electron_eta', 'Muon_softId', 'Muon_pfRelIso03_all', 'Electron_charge', 'Electron_mvaFall17V1noIso', 'Electron_miniPFRelIso_all', 'Electron_pt', 'Electron_mvaFall17V1noIso_WPL', 'Muon_mvaId', 'Muon_isPFcand', 'Muon_isGlobal', 'Electron_convVeto', 'Muon_mediumPromptId', 'Muon_tightCharge', 'nFsrPhoton', 'Muon_charge', 'Electron_mvaFall17V1Iso_WP80', 'Electron_sip3d', 'Electron_pfRelIso03_all', 'Electron_eInvMinusPInv', 'Muon_pfRelIso04_all', 'Electron_mvaFall17V1Iso_WPL', 'Muon_tkRelIso', 'Muon_mvaTTH', 'Muon_eta', 'Muon_cleanmask', 'nJet', 'Muon_ptErr', 'Electron_genPartFlav', 'Electron_lostHits', 'Muon_miniPFRelIso_all', 'Muon_dxy', 'Electron_vidNestedWPBitmapHEEP', 'Muon_phi', 'Electron_mvaFall17V1noIso_WP90', 'Electron_mass', 'MET_pt', 'Muon_multiIsoId', 'Electron_mvaFall17V1noIso_WP80', 'Muon_mass', 'Electron_r9', 'Muon_softMvaId', 'Electron_eCorr', 'Electron_jetPtRelv2', 'Electron_mvaTTH', 'Muon_jetIdx', 'Muon_mediumId', 'Electron_dr03EcalRecHitSumEt', 'Muon_pt', 'Muon_softMva', 'Electron_phi', 'Muon_triggerIdLoose', 'Electron_photonIdx', 'Electron_jetIdx', 'MET_phi', 'Electron_ip3d', 'Electron_seedGain', 'Electron_deltaEtaSC', 'Muon_genPartFlav', 'Electron_mvaFall17V2noIso_WP90', 'Muon_jetRelIso', 'nPhoton', 'Electron_cleanmask', 'Electron_mvaFall17V2Iso', 'Muon_jetPtRelv2', 'Muon_highPtId', 'Electron_dxy', 'Muon_dxyErr', 'Electron_mvaFall17V2Iso_WPL', 'nElectron', 'Electron_genPartIdx', 'Muon_tkIsoId', 'Muon_ip3d', 'Electron_isPFcand', 'Electron_sieie', 'Electron_mvaFall17V2noIso'})}
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[53dc0e266dc346146ef3c581deb98545f89270f4] ci: Ensure attestations for awkward-cpp sdist and wheels (#3135)
running 'python' 'bug.py'
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/methods/candidate.py:11: FutureWarning: In version 2024.7.0 (target date: 2024-06-30 11:59:59-05:00), this will be an error.
To raise these warnings as errors (and get stack traces to find out where they're called), run
    import warnings
    warnings.filterwarnings("error", module="coffea.*")
after the first `import coffea` or use `@pytest.mark.filterwarnings("error:::coffea.*")` in pytest.
Issue: coffea.nanoevents.methods.vector will be removed and replaced with scikit-hep vector. Nanoevents schemas internal to coffea will be migrated. Otherwise please consider using that package!.
  from coffea.nanoevents.methods import vector
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/schemas/nanoaod.py:243: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(
{'from-uproot-81595a16d812d8e421ed4238952ce794': frozenset({'Electron_eCorr', 'Electron_mvaFall17V2Iso', 'Muon_nStations', 'Electron_deltaEtaSC', 'Electron_jetIdx', 'Muon_ptErr', 'Muon_softMvaId', 'Electron_cutBased', 'nFsrPhoton', 'Muon_tunepRelPt', 'Electron_vidNestedWPBitmapHEEP', 'Electron_seedGain', 'Electron_convVeto', 'Muon_tkIsoId', 'Electron_hoe', 'Electron_dxy', 'Electron_mvaFall17V1Iso_WP90', 'Electron_dr03HcalDepth1TowerSumEt', 'nGenPart', 'Electron_mvaFall17V2Iso_WP90', 'Muon_charge', 'Muon_pt', 'Electron_dz', 'Muon_ip3d', 'Muon_phi', 'Electron_genPartIdx', 'Muon_miniIsoId', 'Electron_mvaTTH', 'MET_pt', 'Electron_dr03TkSumPt', 'Muon_mvaId', 'Muon_dxyErr', 'Electron_mvaFall17V1Iso', 'Electron_mvaFall17V2noIso_WP90', 'Electron_mvaFall17V2Iso_WPL', 'Muon_dxy', 'MET_phi', 'Electron_jetRelIso', 'Electron_pfRelIso03_chg', 'Muon_nTrackerLayers', 'Muon_mvaLowPt', 'Electron_mvaFall17V1noIso_WP80', 'nJet', 'Muon_mediumPromptId', 'Electron_ip3d', 'Muon_segmentComp', 'Electron_pt', 'Electron_dr03TkSumPtHEEP', 'Muon_pfRelIso04_all', 'Muon_mediumId', 'Electron_jetPtRelv2', 'Electron_cutBased_HEEP', 'Electron_mvaFall17V1Iso_WPL', 'Muon_dzErr', 'Electron_mvaFall17V1noIso_WP90', 'Electron_sip3d', 'Muon_genPartIdx', 'Electron_phi', 'nMuon', 'Electron_mvaFall17V1Iso_WP80', 'Electron_miniPFRelIso_all', 'Muon_genPartFlav', 'Muon_tightId', 'Muon_pfIsoId', 'Electron_sieie', 'Electron_eta', 'Electron_isPFcand', 'Muon_jetRelIso', 'Muon_tkRelIso', 'Electron_genPartFlav', 'Electron_pfRelIso03_all', 'Muon_looseId', 'Electron_vidNestedWPBitmap', 'Muon_fsrPhotonIdx', 'Electron_r9', 'Electron_mvaFall17V2noIso_WP80', 'Muon_softId', 'Electron_cleanmask', 'Muon_jetPtRelv2', 'Electron_dr03EcalRecHitSumEt', 'Electron_cutBased_Fall17_V1', 'Muon_triggerIdLoose', 'Electron_mvaFall17V2noIso_WPL', 'Muon_isPFcand', 'Electron_mvaFall17V2noIso', 'Muon_jetIdx', 'Muon_multiIsoId', 'Muon_sip3d', 'Muon_mvaTTH', 'Electron_mass', 'Electron_mvaFall17V1noIso_WPL', 'Muon_mass', 'Electron_photonIdx', 'nPhoton', 'Muon_miniPFRelIso_chg', 'nElectron', 'Muon_tightCharge', 'Muon_isGlobal', 'Muon_highPtId', 'Muon_cleanmask', 'Electron_dzErr', 'Electron_mvaFall17V1noIso', 'Muon_eta', 'Electron_mvaFall17V2Iso_WP80', 'Muon_miniPFRelIso_all', 'Electron_energyErr', 'Electron_tightCharge', 'Muon_isTracker', 'Muon_dz', 'Muon_softMva', 'Electron_dxyErr', 'Muon_pfRelIso03_all', 'Electron_eInvMinusPInv', 'Muon_inTimeMuon', 'Electron_charge', 'Electron_lostHits', 'Electron_miniPFRelIso_chg', 'Muon_pfRelIso03_chg'})}
Bisecting: 2 revisions left to test after this (roughly 1 step)
[28a89daaa0ca2fbf4242c897eb6760cb2a9b4ebc] fix: prevent exponential memory growth in UnionArray (#3119)
running 'python' 'bug.py'
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/methods/candidate.py:11: FutureWarning: In version 2024.7.0 (target date: 2024-06-30 11:59:59-05:00), this will be an error.
To raise these warnings as errors (and get stack traces to find out where they're called), run
    import warnings
    warnings.filterwarnings("error", module="coffea.*")
after the first `import coffea` or use `@pytest.mark.filterwarnings("error:::coffea.*")` in pytest.
Issue: coffea.nanoevents.methods.vector will be removed and replaced with scikit-hep vector. Nanoevents schemas internal to coffea will be migrated. Otherwise please consider using that package!.
  from coffea.nanoevents.methods import vector
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/schemas/nanoaod.py:243: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(
{'from-uproot-685afe5860956e841b4f6d433d7c6718': frozenset({'Electron_mvaTTH', 'Muon_multiIsoId', 'Electron_mvaFall17V1Iso_WP80', 'Electron_dr03EcalRecHitSumEt', 'Muon_dxyErr', 'Electron_convVeto', 'Muon_tightCharge', 'Electron_mvaFall17V2Iso_WP90', 'Electron_mvaFall17V1noIso_WP80', 'Muon_miniPFRelIso_chg', 'Electron_cleanmask', 'Electron_pt', 'Electron_mvaFall17V2noIso_WP90', 'Electron_mvaFall17V2Iso', 'Muon_mvaTTH', 'Muon_mvaId', 'Electron_cutBased_Fall17_V1', 'Muon_pfRelIso04_all', 'Muon_jetRelIso', 'Electron_dxy', 'Electron_mvaFall17V2noIso', 'Muon_ip3d', 'Muon_jetPtRelv2', 'Electron_cutBased_HEEP', 'Electron_vidNestedWPBitmap', 'Electron_jetIdx', 'Muon_genPartIdx', 'Electron_seedGain', 'Electron_lostHits', 'Muon_phi', 'Muon_mediumPromptId', 'Muon_sip3d', 'nGenPart', 'Muon_isTracker', 'Electron_isPFcand', 'Muon_miniPFRelIso_all', 'Electron_mvaFall17V2noIso_WPL', 'Electron_tightCharge', 'Electron_mvaFall17V2noIso_WP80', 'Muon_tkIsoId', 'Electron_r9', 'Electron_ip3d', 'Electron_pfRelIso03_chg', 'Electron_phi', 'Muon_isPFcand', 'Muon_nTrackerLayers', 'Electron_eInvMinusPInv', 'Electron_vidNestedWPBitmapHEEP', 'Electron_jetRelIso', 'nJet', 'Electron_mvaFall17V1Iso_WPL', 'Muon_triggerIdLoose', 'Muon_genPartFlav', 'Electron_genPartIdx', 'Electron_eta', 'Electron_mvaFall17V2Iso_WP80', 'Electron_mvaFall17V1noIso_WP90', 'Electron_dz', 'Electron_dr03HcalDepth1TowerSumEt', 'Electron_sip3d', 'Muon_mass', 'Electron_dzErr', 'Muon_miniIsoId', 'MET_pt', 'Muon_nStations', 'Electron_mass', 'Muon_eta', 'Muon_pfRelIso03_chg', 'Electron_dxyErr', 'nPhoton', 'Muon_pfIsoId', 'MET_phi', 'Muon_isGlobal', 'Electron_cutBased', 'Muon_softMvaId', 'Electron_jetPtRelv2', 'nElectron', 'Muon_looseId', 'Muon_softId', 'Muon_cleanmask', 'Muon_highPtId', 'Muon_segmentComp', 'Electron_photonIdx', 'Muon_pfRelIso03_all', 'nFsrPhoton', 'Muon_jetIdx', 'Electron_mvaFall17V1noIso', 'Muon_dzErr', 'Electron_energyErr', 'Muon_mediumId', 'Electron_mvaFall17V1Iso_WP90', 'Electron_mvaFall17V1noIso_WPL', 'Electron_miniPFRelIso_chg', 'Electron_charge', 'Muon_charge', 'Muon_pt', 'Muon_inTimeMuon', 'Electron_pfRelIso03_all', 'Electron_sieie', 'Muon_ptErr', 'nMuon', 'Electron_miniPFRelIso_all', 'Electron_dr03TkSumPtHEEP', 'Electron_deltaEtaSC', 'Muon_tightId', 'Muon_mvaLowPt', 'Muon_fsrPhotonIdx', 'Muon_tkRelIso', 'Muon_tunepRelPt', 'Muon_softMva', 'Electron_eCorr', 'Electron_hoe', 'Electron_genPartFlav', 'Electron_dr03TkSumPt', 'Muon_dz', 'Electron_mvaFall17V2Iso_WPL', 'Muon_dxy', 'Electron_mvaFall17V1Iso'})}
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[6cff8e939f8b9365bece63b59919b823cd35304c] fix: wait for Jitify performing a one-time only warm-up (#3113)
running 'python' 'bug.py'
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/methods/candidate.py:11: FutureWarning: In version 2024.7.0 (target date: 2024-06-30 11:59:59-05:00), this will be an error.
To raise these warnings as errors (and get stack traces to find out where they're called), run
    import warnings
    warnings.filterwarnings("error", module="coffea.*")
after the first `import coffea` or use `@pytest.mark.filterwarnings("error:::coffea.*")` in pytest.
Issue: coffea.nanoevents.methods.vector will be removed and replaced with scikit-hep vector. Nanoevents schemas internal to coffea will be migrated. Otherwise please consider using that package!.
  from coffea.nanoevents.methods import vector
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/schemas/nanoaod.py:243: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(
{'from-uproot-bb74eb145870034e6b34d269e2b468e2': frozenset({'Muon_charge', 'nElectron', 'Electron_mass', 'Electron_pt', 'Electron_eta', 'Electron_phi', 'Muon_pt', 'MET_pt', 'nMuon', 'Muon_phi', 'Muon_eta', 'Muon_mass', 'MET_phi', 'Electron_charge'})}
28a89daaa0ca2fbf4242c897eb6760cb2a9b4ebc is the first bad commit
commit 28a89daaa0ca2fbf4242c897eb6760cb2a9b4ebc
Author: Jim Pivarski <jpivarski@users.noreply.github.com>
Date:   Tue May 28 20:13:46 2024 +0530

    fix: prevent exponential memory growth in UnionArray (#3119)

 src/awkward/contents/unionarray.py                 |  4 +--
 tests/test_2713_from_buffers_allow_noncanonical.py |  7 +++--
 ...vent_exponential_memory_growth_in_unionarray.py | 34 ++++++++++++++++++++++
 3 files changed, 40 insertions(+), 5 deletions(-)
 create mode 100644 tests/test_3118_prevent_exponential_memory_growth_in_unionarray.py
bisect found first bad commit
(work) ➜  awkward git:(6cff8e93) ✗
lgray commented 1 month ago

a ping on this one @jpivarski @agoose77 ... this is rather badly noticeable.

jpivarski commented 1 month ago

I'm trying to reproduce this using only Awkward, both to narrow the scope of search for the bug and to add a test to our test suite. Since you've narrowed in on #3119 as a cause, I think it must have something to do with RecordArrays in IndexedArrays, and ak.argcombinations or the application of it (i.e. ak.combinations) is also suspicious because it makes combinations of records by hanging them inside of IndexedArrays. I don't think the issue could be related to any of the calculations on scalars, since #3119 can't have anything to do with non-RecordArrays.

But so far, I haven't been able to reproduce it. I'm starting from a dataset that's already grouped into "electron" and "muon" records, for convenience:

dataset = ak.from_parquet("https://github.com/jpivarski-talks/2024-07-24-codas-hep-ml/raw/main/data/SMHiggsToZZTo4L.parquet")

I make leptons, so that we have a UnionArray:

leptons = ak.concatenate([dataset.electron, dataset.muon], axis=1)

then make a TypeTracerArray report to track touching:

layout, report = ak.typetracer.typetracer_with_report(leptons.layout.form_with_key())

and it's initially empty:

print(report.data_touched, report.shape_touched)
# [] []

Slicing through both the electrons and muons touches only the expected buffers:

ak.Array(layout).pt
print(report.data_touched, report.shape_touched)
# ['node1', 'node3', 'node14'] ['node0', 'node1', 'node3', 'node14']

The node0 is the outer ListOffsetArray's offsets, the node1 is the UnionArray tags and index, node3 is the electron pT and node14 is the muon pT.

None of the following touch any more buffers than the above:

Printing arrays to the screen would touch more buffers (hence the precaution of assigning to tmp) and calculating masses would pull in all of the kinematics (node4, node5, node6, node15, node16, node17), but at that point, we're not working on RecordArrays anymore. I'm looking for something that you did that touches ~everything in this example.

Okay, ak.local_index touches everything:

ak.local_index(ak.Array(layout))

and I can't think of why that would be. I'm looking into it now...

jpivarski commented 1 month ago

Actually, the above only touches the shape of everything. Is that consistent with what you were seeing? That the shape of everything was being touched (as opposed to the data)?

jpivarski commented 1 month ago

But

ak.local_index(ak.Array(layout))

touching the shape of many buffers also happens if I revert #3119, so it might not be your issue.

jpivarski commented 1 month ago

ak.local_index has a default argument of axis=-1, so determining the actual axis depth goes through ak._layout.maybe_posaxis, and this touches everybody's shape. It happens on this line:

https://github.com/scikit-hep/awkward/blob/bf1c8746812d55cb9cec23e954188044b9a7fcfc/src/awkward/_layout.py#L387

Perhaps branch_depth shouldn't be touching shape. But before I dig deeper into this, is it your issue? For instance, is the issue caused by your

    l3 = ak.local_index(events.leptons)

line? (If you stop computing just before this line, do you get no error? If you stop computing just after this line, do you get the error?)

Also, the maybe_posaxis/branch_depth touching would go away with

    l3 = ak.local_index(events.leptons, axis=1)

(instead of the default axis=-1). Does that also make the error go away? If so, then we know the cause and I think I can find a way to fix it. If not, then this maybe_posaxis/branch_depth thing might or might not be an issue, but it isn't your issue.

lgray commented 1 month ago

No, this does not fix the issue. :-/

I'll see if I can synthesize an awkward only reproducer.

ikrommyd commented 1 month ago

Got it a little more minimal for now. This is still overtouching

import awkward as ak
import numpy as np

from coffea.nanoevents import NanoEventsFactory
import dask_awkward as dak

events = NanoEventsFactory.from_root(
    {
        "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"
    },
).events()

events["leptons"] = ak.concatenate(
    [events.Electron, events.Muon],
    axis=1,
)
pair = ak.argcombinations(events.leptons, 2, fields=["l1", "l2"])

pair = pair[
    ak.argmin(
        (events.leptons[pair.l1] + events.leptons[pair.l2]).pt,
        axis=1,
    )
]

events = events[ak.num(pair) > 0]
l3 = events.leptons

print(dak.necessary_columns(l3.pt))
print(l3.pt.compute())

Everything here seems improtant. If I try do remove anything, like the events = events[ak.num(pair) > 0], the concatenation and just do events["leptons"] = evens.Electron or within the argmin have something like events.leptons[pair.l1].pt that doesn't include both l1 and l2, overtouchin goes away.

ikrommyd commented 1 month ago

This is also happening. There's overtouching on events.Muon.pt

import awkward as ak
import numpy as np

from coffea.nanoevents import NanoEventsFactory
import dask_awkward as dak

events = NanoEventsFactory.from_root(
    {
        "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"
    },
).events()

events["leptons"] = ak.concatenate(
    [events.Electron, events.Muon],
    axis=1,
)
pair = ak.argcombinations(events.leptons, 2, fields=["l1", "l2"])

pair = pair[
    ak.argmin(
        (events.leptons[pair.l1] + events.leptons[pair.l2]).pt,
        axis=1,
    )
]

events = events[ak.num(pair) > 0]

print(dak.necessary_columns(events.Muon.pt))
{'from-uproot-85eb9d204817e27e080b72855fb7c5d3': frozenset({'Muon_isTracker', 'Electron_mass', 'Electron_mvaFall17V1noIso', 'nJet', 'Electron_eta', 'Muon_dxy', 'Electron_mvaTTH', 'Muon_softMvaId', 'Muon_sip3d', 'Muon_pfRelIso03_chg', 'nMuon', 'Electron_pfRelIso03_all', 'Muon_isGlobal', 'Muon_tunepRelPt', 'Muon_miniPFRelIso_chg', 'Electron_isPFcand', 'Muon_jetIdx', 'Electron_dz', 'nFsrPhoton', 'Muon_pfRelIso03_all', 'nGenPart', 'Electron_dzErr', 'nPhoton', 'Electron_pdgId', 'Muon_genPartIdx', 'Muon_highPtId', 'Muon_jetPtRelv2', 'Muon_tightCharge', 'Electron_mvaFall17V2Iso_WP80', 'Electron_mvaFall17V2noIso_WP80', 'Electron_convVeto', 'Electron_mvaFall17V1Iso', 'Electron_lostHits', 'Muon_tkIsoId', 'Electron_eCorr', 'Electron_dxy', 'Electron_genPartFlav', 'Muon_mediumPromptId', 'Muon_miniPFRelIso_all', 'Electron_miniPFRelIso_all', 'Muon_pfRelIso04_all', 'Muon_inTimeMuon', 'Electron_jetRelIso', 'Muon_pt', 'Muon_mvaLowPt', 'Electron_cutBased', 'Electron_cutBased_HEEP', 'Muon_fsrPhotonIdx', 'Electron_charge', 'Muon_miniIsoId', 'Muon_mass', 'Electron_cleanmask', 'Electron_mvaFall17V2Iso_WPL', 'Electron_mvaFall17V1noIso_WP90', 'Electron_vidNestedWPBitmapHEEP', 'Electron_pfRelIso03_chg', 'Muon_dz', 'Muon_isPFcand', 'Muon_triggerIdLoose', 'Electron_miniPFRelIso_chg', 'Electron_dr03EcalRecHitSumEt', 'Electron_sieie', 'Muon_tightId', 'Electron_photonIdx', 'Muon_pfIsoId', 'Electron_r9', 'Electron_mvaFall17V1noIso_WP80', 'Electron_mvaFall17V1noIso_WPL', 'Electron_mvaFall17V1Iso_WP90', 'Electron_mvaFall17V1Iso_WP80', 'Muon_multiIsoId', 'Electron_mvaFall17V1Iso_WPL', 'Muon_eta', 'Electron_mvaFall17V2noIso_WP90', 'Electron_phi', 'Muon_tkRelIso', 'Muon_ip3d', 'Electron_jetPtRelv2', 'Electron_cutBased_Fall17_V1', 'Muon_softMva', 'Electron_hoe', 'Muon_jetRelIso', 'Muon_dzErr', 'Muon_genPartFlav', 'Muon_pdgId', 'Electron_jetIdx', 'Muon_dxyErr', 'Electron_energyErr', 'Muon_cleanmask', 'Electron_deltaEtaSC', 'Electron_dxyErr', 'Muon_looseId', 'Muon_charge', 'Muon_mvaTTH', 'Electron_seedGain', 'Electron_mvaFall17V2Iso', 'Electron_ip3d', 'nElectron', 'Electron_dr03HcalDepth1TowerSumEt', 'Electron_sip3d', 'Muon_softId', 'Electron_dr03TkSumPt', 'Electron_dr03TkSumPtHEEP', 'Electron_mvaFall17V2Iso_WP90', 'Electron_mvaFall17V2noIso', 'Muon_ptErr', 'Electron_tightCharge', 'Muon_mvaId', 'Muon_nStations', 'Muon_phi', 'Muon_segmentComp', 'Muon_mediumId', 'Electron_pt', 'Electron_eInvMinusPInv', 'Muon_nTrackerLayers', 'Electron_vidNestedWPBitmap', 'Electron_mvaFall17V2noIso_WPL', 'Electron_genPartIdx'})}
ikrommyd commented 1 month ago

Could https://github.com/dask-contrib/dask-awkward/issues/526 be somehow part of the problem as well since concatenation and argcombinations are required?

lgray commented 1 month ago

Here is a minimal reproducer using awkward and one coffea behavior:

import awkward as ak

from coffea.nanoevents.methods import nanoaod

def test_necessary_columns():

    with open("nanoevents_form.json", "r") as fin:
        form = ak.forms.from_json(fin.read())

    events_layout, report = ak.typetracer.typetracer_with_report(form)

    events = ak.Array(events_layout, behavior=nanoaod.behavior)

    events["leptons"] = ak.concatenate(
        [events.Electron, events.Muon],
        axis=1,
    )
    pair = ak.argcombinations(events.leptons, 2, fields=["l1", "l2"])

    # looks like all that is needed to trigger it is the sum with behaviors?
    ptsum = (events.leptons[pair.l1] + events.leptons[pair.l2]).pt

    if len(str(report.data_touched)) > 1000:
        print("bad!")
        return 1
    else:
        print("good!")
        return 0

exit(test_necessary_columns())

I have attached the json of the input form to this post. nanoevents_form.json

jpivarski commented 1 month ago

@pfackeldey, @maxgalli, and I looked into this. The over-touching problem comes from broadcasting through a union (the leptons) to apply a ufunc (the np.add) here:

https://github.com/scikit-hep/awkward/blob/bf1c8746812d55cb9cec23e954188044b9a7fcfc/src/awkward/_broadcasting.py#L946

Before PR #3119, UnionArray.project replaced a RecordArray with an IndexedArray of RecordArray ("lazily projecting"), such that if the UnionArray only exposed a subset of records, the IndexedArray would only have indexes for those records. Whatever fields the record contained were not touched.

After PR #3119, UnionArray.project became eager: replacing a RecordArray with a RecordArray of only the records that the UnionArray exposed—the change is here. Since we're slicing the records non-lazily, it must touch all the fields of the record.

PR #3119 was necessary because assignments like cat["another", "w"] = three.x were causing the memory to explode. As in tests/test_3118_prevent_exponential_memory_growth_in_unionarray.py, every time nested records in a union were assigned to with

cat["another", "w"] = three.x

the size of an internal buffer doubled. (So imagine assigning 20 fields into such a record.) The exponential growth of that internal buffer had something to do with the interplay between broadcasting and lazy projection, and so I fixed it by removing lazy projection.

@pfackeldey is looking into the problem, to see if there's some other way to avoid exponential memory growth of that buffer, while keeping this one project operation lazy so that it doesn't touch all fields.

Both of these problems are tightly coupled to the fact that these are UnionArrays. They wouldn't happen with any other array type.

ikrommyd commented 1 month ago

Thanks for investigating!

jpivarski commented 1 month ago

Fixed by #3193.