scikit-hep / root_pandas

A Python module for conveniently loading/saving ROOT files as pandas DataFrames
MIT License
109 stars 35 forks source link

Reading file with chunksize now gives iterator error #87

Closed mdpunch closed 4 years ago

mdpunch commented 4 years ago

Hello,

Thanks for the package.

But, after an update (using now root_pandas 0.7.0, Python 3.7.3), my code for reading a file in chunks is now broken.

Previously:

  read_root_iter = read_root(file,chunksize=100000)
  df = next(read_root_iter)

worked as expected, with read_root_iter being an iterator, with a __next__ method.

Currently, this gives an error:

TypeError Traceback (most recent call last)

in 1 read_root_iter = read_root(file,chunksize=100000) ----> 2 df = next(read_root_iter) TypeError: 'genchunk' object is not an iterator

However, doing

read_root_iter = read_root(file,chunksize=100000)
for df in read_root_iter:
    # Handle df

still works (but would require lots of code restructuring for me).

Is this change in behaviour expected, and is there some way to use the returned "iterator" from read_root as an actual iterator ?

For now, I downgrade to root_pandas 0.6.0, where the iterator works.

Good Luck, Michael.

chrisburr commented 4 years ago

This is the change which is causing your problem: https://github.com/scikit-hep/root_pandas/compare/v0.6.0...v0.6.1#diff-f1a13e96d09db54e1548192242ca5ce4

One workaround would be to change it to iter(read_root(file, chunksize=100000)) else if you want to open a pull request I'm happy to make a new release.

Be aware though that root_pandas is effectively deprecated now due to root_numpy no longer being maintained (root_pandas is simply a very smaller wrapper around it).

Better functionality and performance can be obtained either using uproot, which natively supports tree.pandas.df(), or in ROOT itself using pandas.DataFrame(RDataFrame('key', 'filename.root').AsNumpy()).

mdpunch commented 4 years ago

Wow! Thanks for the rapid response!

For my limited current use, your workaround is fine, so no need for a pull request.

I will take your suggestion of using uproot for the future.

Thanks again!