iris-hep / idap-200gbps

MIT License
3 stars 4 forks source link

Avoid `count_nonzero` calls #20

Open pfackeldey opened 2 days ago

pfackeldey commented 2 days ago

We should be able to purely load data with this trick:

import dask_awkward as dak

def load_columns(events, columns):
  for col in columns:
    _ = events[col]
  return events

dak.map_partitions(load_columns, events, [("Jet", "pt"), ("Jet", "eta"), ...])

This way we avoid using unnecessary ak.count_nonzero calls in materialize_branches.ipynb

alexander-held commented 2 days ago

This might also help keep memory requirements low, looks like a good setup to try out.