scikit-hep / uproot3

ROOT I/O in pure Python and NumPy.
BSD 3-Clause "New" or "Revised" License
314 stars 67 forks source link

uproot inserts `5f` at several positions in member names of root objects #520

Closed maxnoe closed 4 years ago

maxnoe commented 4 years ago

Reading a root object, defined here:

https://github.com/Eventdisplay/Eventdisplay/blob/master/inc/VMonteCarloRunHeader.h

uproot is able to read the object and the data therein but it inserts seemingly random 5f s into some of the attribute names:

 '_num_5f_showers',
 '_num_5f_use',
 '_obsheight',
 '_primary_5f_id',
 '_viewcone',

Is this intendend? Why does this only appear in some members?

import uproot

f = uproot.open('./gamma_cone.Nb.3AL4-BN15-TI_ID0.eff-0.root')
run_header  = f['MC_runheader']

print(run_header._fields)

print([k.replace('_5f_', '_') for k in run_header._fields])

Unfortunately the only files I have are quite large

maxnoe commented 4 years ago

It seems to replace underscores with _5f_

jpivarski commented 4 years ago

C++ class names can include more characters than Python class names can, particularly :: for namespaces and < > for templates, so they have to be encoded. They're encoded by replacing the non-alphanumeric characters to hex equivalents, separated from the rest of the name by underscores. We don't want an underscore by itself to be confused with the beginning of an encoding, so underscores are converted as well (5f is hex for underscore's ASCII value).

That's for class names, but you're talking about class members, which have the same character restrictions as Python—why convert those? Well, C++ names (both for classes and members) also have to be converted from ASCII to Unicode: Python 3 names are Unicode. For simplicity, Uproot 3 had only one "make names okay for Python" function, which did both encodings.

Uproot 4 still has the encoding for C++ class names (and it's better formalized, with explicit, public functions for translation), but C++ class members are now isolated from Python class attributes by putting them in an entirely different namespace:

(Similarly, when looking up classes by name, one uses the C++ name in quotes.)

So, this wasn't a bug—it was intentional—but perhaps surprising. I'm hoping that the new method is less surprising.

maxnoe commented 4 years ago

Yes, using uproot4 and e.g. the all_members attribute works fine.

So I guess I am now looking forward to the 4.0 release more than ever...

jpivarski commented 4 years ago

It's not a future thing—you can use it now. It had two features missing: documentation and file-writing. Last week, I finished adding docstrings to all modules, classes, functions, and methods; what remains is to put them all on the web with readthedocs—other things have been keeping me from that last step. File-writing will be a little longer, and perhaps I should switch PyPI names (uproot4uproot, uprootuproot3) before that, with an API in place that says, "For file-writing, import uproot3. Sorry!" so that more people try it out.

tamasgal commented 4 years ago

I'm hoping that the new method is less surprising.

The new method is definitely more user-friendly when debugging!

Offtopic: I collected a few caveats during my uproot3 -> uproot4 transition (still ongoing), so if you want, I can wrap them up and they could be dumped into a special section in the docs ("Migrating from uproot3"?)

jpivarski commented 4 years ago

Since wikis aren't where people look for documentation, I converted the Awkward wiki into a "collecting thoughts on a Awkward 0 → Awkward 1 cheat sheet/migration guide".

It makes sense to do the same for Uproot: I just created a wiki here, too.

These can be totally disorganized; I'll tidy them up when converting them to user-facing documentation. Short code examples are good, but not long ones or ones that depend on anything hard to reproduce. scikit-hep-datadata is a good source of world-accessible files.

jpivarski commented 4 years ago

Oh, and I'll be closing this because it's not a bug. (I'll admit it does look suspicious; thanks for the feedback, @MaxNoe!)