Open OrestZborowski-SIG opened 2 years ago
This is a small program that demonstrates the issue
import riptable as rt
import riptide_cpp as rc
print("initializing")
arr1 = rt.FA([b'ABC'])
arr2 = rt.FA([123])
arrs = [arr1,arr2]
print("setting arr1 name")
arr1.set_name('NameABC')
print(f"arr1.name={arr1.get_name()}")
print("calling rc.MultiKeyGroupBy32")
iKey, iFirstKey, unique_count = rc.MultiKeyGroupBy32(arrs, 0, None, 2, None)
print(f"key.name={iKey.get_name()}")
It emits
initializing
setting arr1 name
arr1.name=NameABC
calling rc.MultiKeyGroupBy32
key.name=NameABC
so the resulting key array has the same name as the first input, which is used in the new-from-template.
In v1.1.0, the code emits
initializing
setting arr1 name
arr1.name=NameABC
calling rc.MultiKeyGroupBy32
key.name=None
The AllocateLikeNumpyArray() function seems to intend to create a new array with the same shape as the template array, but it uses the PyArray_NewLikeArray API with subOK=true, which implies a 'new-from-template' mechanism (see docs). That mechanism will end up invoking the __array_finalize__() method, which will potentially copy attributes from the template into the new instance. This is great for foo.view() and not so great in this case, where we want a default-initialized instance.
Instead, this API should use the lower-level PyArray_NewFromDescr and only pass in the descriptor from the template, so that we get a default-initialized instance.
This was uncovered as part of investigation into rtosholdings/riptable#285