scikit-hep / uproot3-methods

Pythonic behaviors for non-I/O related ROOT classes.
BSD 3-Clause "New" or "Revised" License
21 stars 28 forks source link

Avoid dynamic creation of ArrayMethods classes and new wrapjaggedmethods decorator #48

Closed guitargeek closed 5 years ago

guitargeek commented 5 years ago

Hi, I suggest here a fix for the issue I explained in https://github.com/scikit-hep/awkward-array/pull/120.

The problem is the dynamic creation of classes in the _unwrapped_jagged function, which causes classes which you would expect to have the same type to actually not the same type.

I suggest two solution approaches which both work for my use case.

1) create the new types ass class members (not instance members) of the classes that use them 2) create the new types outside of the class, just in the scope of the file

I applied solution 2) to the from_ptetaphim function. Creating the types outside the class actually has one more cool benefit: you can use the types with a new decorator I called for now wrapjaggedmethod, which takes care of the _normalize_arrays, _unwrapped_jagged and re-wrapping into a jagged array for you.

I hope these Ideas could be useful to reduce code duplication as well. What to you think? Sorry again for initially implementing a lousy unit test which didn't catch this before...

nsmith- commented 5 years ago

If I understand correctly, this wrapjagged is exactly what I was looking for in https://github.com/CoffeaTeam/fnal-column-analysis-tools/issues/60#issuecomment-473594305 This is a procedure we use often in that package, it would be good for the facility to be available in awkward itself.

jpivarski commented 5 years ago

I was just about to give awkward a new semimajor version, which is a thing that would be required if we want to introduce a cross-repo dependency like this.

The whole file uproot_methods/wrapjagged.py could be moved into awkward/util.py. Do you want me to do that?

guitargeek commented 5 years ago

To me that sounds like a good idea! I would also suggest to decorate all the other from_xyz methods with wrapjaggedmethod, to close the circle. I would volunteer to do that quickly.

The name wrapjaggedmethod is probably not optimal by the way, because it suggest it can only be with classmethods and not with regular functions.

jpivarski commented 5 years ago

@guitargeek Could you create a PR of how you'd like it to look in awkward and then in uproot-methods? I understand that the uproot-methods PR won't succeed until the awkward PR is deployed, at least to a prerelease. I'm going to start a prerelease cycle for all three to get the machinery started.

If it's a large project, please let me know and we'll put it off for later. I want to have a pip-installable version for a talk on Wednesday. Thanks!

nsmith- commented 5 years ago

This makes implementing a few non-ufunc numpy ops trivial, so would be very useful I think! Perhaps the broader idea is a decorator that wraps a function that normally accepts numpy arrays and allows it to be used with awkward (i.e. not just jagged) arrays?

jpivarski commented 5 years ago

@nsmith- If you mean "any function," then that's too broad. I don't see a way to do it in general.

The important thing right now—at the threshold of awkward version 0.9.0, is to get in the API dependencies between awkward and uproot-methods to get the functionality above to work. We need general names, because we can't change names until the next semimajor version, but we can leave holes for additional functionality. For instance, the wrapper might only work on JaggedArrays right now, but we name it something general and let it raise NotImplementedError on non-JaggedArray cases. As long as the non-JaggedArray cases can be implemented in only one repo—awkward and not uproot-methods, for instance—then we can do that at any time and it wouldn't require a new version. It would be a future awkward feature, not a combined awkward/uproot-methods feature, like this one.

guitargeek commented 5 years ago

Alright, is it ok if I quickly come up with two PRs each in awkward and uproot-methods and then we can continue discussing? For now I will rename the decorator to just awkwardfunction I think, but maybe you will find a better name.

jpivarski commented 5 years ago

Yes, focus on the two PRs—no new capabilities, just organizing this capability better. The name will have to explain what it does in contrast to other awkward.util functions, "awkwardfunction" doesn't tell me anything. It would still be correct to call this "wrap" and "unwrap", right? How about unwrap_awkward and wrap_awkward?

I used the word "normalize" in a CS way while that function was private (with an underscore). If it's public and physicists will see it, there's a chance of confusion (making the norm 1). Maybe it should be called uniformly_awkward? It makes all the given arrays have the same structure.

guitargeek commented 5 years ago

Haha right awkwarfunction is a terribly generic name!

Not all functions have to be public, the underscore could still exist for everything else than the decorator. It could be: _normalize_arrays -> _wrap_awkward _unwrap_jagged -> _unwrap_awkward wrapjaggedmethod -> ?? (the decorator)

edit: got the mapping wrong and corrected. Anyway I will not worry about the names now, I cant seem to come up with something good.

jpivarski commented 5 years ago

If uproot-methods is going to call it, the methods must be public. (Underscore means we're free to change it, but if another library, like uproot-methods, depends on it, then changing it would break that without us knowing. That happened once—it's a disaster I'd like to avoid.)

I'm confused by your mapping. Why does "unwrap" become "wrap"? That's the opposite. What _normalize_arrays did wasn't wrapping or unwrapping—it was getting all the arrays in a set to the same level of jaggedness, even if some were scalars or flat arrays. (E.g. allows us to set jagged arrays on pt, eta, phi but a scalar on mass.)

jpivarski commented 5 years ago

scikit-hep/awkward-array#122