Closed Saransh-cpp closed 3 months ago
Any update on the tests for this PR?
Hi, this already has tests! I think I forgot to request a review.
Hi @jpivarski, thanks for the review and fixing the PR! Sorry for taking too long to respond here. The MWE below shows why I added the if
condition in this PR -
import awkward as ak
import numpy
class SuperVector:
def add(self, other):
return ak.zip(
{"x": self.x + other.x, "y": self.y + other.y},
with_name="VectorTwoD",
behavior=self.behavior,
)
@ak.mixin_class(ak.behavior)
class VectorTwoD(SuperVector):
def __eq__(self, other):
return ak.all(self.x == other.x) and ak.all(self.y == other.y)
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)
print(ak.behavior)
@ak.mixin_class(ak.behavior)
class VectorTwoDAgain(VectorTwoD):
pass
print(ak.behavior)
# ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDAgainArray'>
# 'VectorTwoDAgain': <class '__main__.VectorTwoDAgainRecord'>
ak.behavior.update(
ak._util.copy_behaviors(VectorTwoD, VectorTwoDAgain, ak.behavior)
)
print(ak.behavior)
# ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDArray'> -> removing the star condition (the class is wrong)
# ('V', 'e', 'c', 't', 'o', 'r', 'T', 'w', 'o', 'D', 'A', 'g', 'a', 'i', 'n'): <class '__main__.VectorTwoDAgainRecord'> -> removing the string condition
Additionally, replacing the generic string condition with a more specific case -
if oldname != key:
leads to -
{'VectorTwoD': <class '__main__.VectorTwoDRecord'>, ('*', 'VectorTwoD'): <class '__main__.VectorTwoDArray'>, (<ufunc 'add'>, 'VectorTwoD', 'VectorTwoD'): <function <lambda> at 0x1050a7ba0>, 'VectorTwoDAgain': <class '__main__.VectorTwoDAgainRecord'>, ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDAgainArray'>, (<ufunc 'add'>, 'VectorTwoDAgain', 'VectorTwoDAgain'): <function <lambda> at 0x1050a7ba0>, ('V', 'e', 'c', 't', 'o', 'r', 'T', 'w', 'o', 'D', 'A', 'g', 'a', 'i', 'n'): <class '__main__.VectorTwoDAgainRecord'>}
Could you please let me know what should be done here? I tested the current implementation with Coffea and everything works as expected. (I pointed out that there is a bug in copy_behaviors
on Slack but that was a false alarm.)
('V', 'e', 'c', 't', 'o', 'r', 'T', 'w', 'o', 'D', 'A', 'g', 'a', 'i', 'n')
You definitely want to check for the isinstance(key, str)
case and not iterate over the string as though it were a tuple.
I was assuming that you'd copy all of the behaviors, including the "VectorTwoD"
and the ("*", "VectorTwoD")
ones, and then replace the ones that are different for the new class, which may be NumPy overloads, Numba overloads, or whole class overloads. That's why I was thinking you'd want this function:
def copy_behaviors(existing_class: typing.Any, new_class: typing.Any, behavior: dict):
output = {}
oldname = existing_class.__name__
newname = new_class.__name__
for key, value in behavior.items():
if isinstance(key, str):
if key == oldname:
output[newname] = value
else:
if oldname in key:
new_tuple = tuple(newname if k == oldname else k for k in key)
output[new_tuple] = value
return output
and then define the new class after having copied the behaviors:
class VectorTwoD(SuperVector):
def __eq__(self, other):
return ak.all(self.x == other.x) and ak.all(self.y == other.y)
ak.behavior["VectorTwoD"] = VectorTwoD
ak.behavior["*", "VectorTwoD"] = VectorTwoD
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)
ak.behavior.update(ak._util.copy_behaviors(VectorTwoD, VectorTwoDAgain, ak.behavior))
class VectorTwoDAgain(VectorTwoD):
pass
ak.behavior["VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior["*", "VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2)
But now I see the problem. You were using the @mixin_class
decorator to add both Array and Record forms of the class to ak.behavior
, which is equivalent to assigning them directly into the ak.behavior
dict, as I did above, but since it's a decorator, it has to be assigned exactly when the class gets defined. The current API for copy_behaviors
takes the class object as an argument, so it has to be called after the class gets defined. That's why you can't copy all behaviors and then overwrite some of them: there are causality constraints on when they can be defined and when copy_behaviors
can be called.
This is going to be a problem because the "VectorTwoD"
and the ("*", "VectorTwoD")
keys are not the only ones that get assigned by decorators: keys like (numpy.add, "VectorTwoD", "VectorTwoD")
can also be defined by @ak.mixin_class_method
, which has to be called at the time when the class is defined. (See ak.mixin_class_method and Mixin decorators.) If the developer writes a class using this instead of assigning a lambda expression, they'd be in just as much trouble as my example above.
The copy_behaviors
function needs to not take the class object as an argument, and it should (as a best practice) be called before the class gets defined. In fact, the function doesn't even use the class objects, just their names; it should only ask for what it's going to need.
This should be enough:
def copy_behaviors(from_name: str, to_name: str, behavior: dict):
output = {}
for key, value in behavior.items():
if isinstance(key, str):
if key == from_name:
output[to_name] = value
else:
if from_name in key:
new_tuple = tuple(to_name if k == from_name else k for k in key)
output[new_tuple] = value
return output
which would then be used like this:
class VectorTwoD(SuperVector):
def __eq__(self, other):
return ak.all(self.x == other.x) and ak.all(self.y == other.y)
ak.behavior["VectorTwoD"] = VectorTwoD
ak.behavior["*", "VectorTwoD"] = VectorTwoD
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)
ak.behavior.update(ak._util.copy_behaviors("VectorTwoD", "VectorTwoDAgain", ak.behavior))
class VectorTwoDAgain(VectorTwoD):
pass
ak.behavior["VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior["*", "VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2)
or like this:
@ak.mixin_class(ak.behavior)
class VectorTwoD(SuperVector):
def __eq__(self, other):
return ak.all(self.x == other.x) and ak.all(self.y == other.y)
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)
ak.behavior.update(ak._util.copy_behaviors("VectorTwoD", "VectorTwoDAgain", ak.behavior))
@ak.mixin_class(ak.behavior)
class VectorTwoDAgain(VectorTwoD):
pass
ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2)
The general strategy, for a developer who is using the copy_behaviors
function, is to (1) copy everything and (2) replace what's different about this new class. The copy_behaviors
API needs to be flexible enough to let them do that. Since some ways of defining behaviors are bound to a class decorator, constraining when they can be assigned, the copy_behaviors
has to not depend on the class object, or else there will be a circular dependency.
Thank you for the detailed explanation! I went through it and have created a PR with the fix.
How should I interpret the fact this is in the _util
module? Is it private? Is it ok to use in coffea?
copy_behaviors
is definitely intended to be a public API. Maybe I should move the function to another module?
XRef #2433