Open ico1036 opened 3 years ago
Lazy arrays are good for interactive exploration, but the most efficient way to process multiple files with Uproot only is uproot.iterate
(because it ensures that only a manageable amount of data is in memory at once).
I say "using Uproot only" because if you have a very large number of files, you'll want to distribute the job and run it in parallel. Uproot doesn't do that (as it's strictly an I/O library). Coffea Processors are a convenient way to do it on HEP.
Lazy arrays are good for interactive exploration, but the most efficient way to process multiple files with Uproot only is
uproot.iterate
(because it ensures that only a manageable amount of data is in memory at once).I say "using Uproot only" because if you have a very large number of files, you'll want to distribute the job and run it in parallel. Uproot doesn't do that (as it's strictly an I/O library). Coffea Processors are a convenient way to do it on HEP.
Thank you very much! I tested this script and checked following results:
What is the most efficient way to deal with multiple root files (~100G) in uproot3 and uproot4? I cannot find tutorial about this.
I tried the lazy array but it takes a lot of time.
Also, I tried the iterator but I'm not sure this loop-based method is efficient (https://github.com/JW-corp/J.W_Analysis/blob/main/Uproot/test/big_data.py)
Thanks.