rtosholdings / riptide_cpp

Multihreaded 64 bit c++ files for processing numba arrays
BSD 3-Clause "New" or "Revised" License
17 stars 4 forks source link

AllocateLikeNumpyArray() may return arrays with unwanted baggage #81

Open OrestZborowski-SIG opened 2 years ago

OrestZborowski-SIG commented 2 years ago

The AllocateLikeNumpyArray() function seems to intend to create a new array with the same shape as the template array, but it uses the PyArray_NewLikeArray API with subOK=true, which implies a 'new-from-template' mechanism (see docs). That mechanism will end up invoking the __array_finalize__() method, which will potentially copy attributes from the template into the new instance. This is great for foo.view() and not so great in this case, where we want a default-initialized instance.

Instead, this API should use the lower-level PyArray_NewFromDescr and only pass in the descriptor from the template, so that we get a default-initialized instance.

This was uncovered as part of investigation into rtosholdings/riptable#285

OrestZborowski-SIG commented 2 years ago

This is a small program that demonstrates the issue

import riptable as rt
import riptide_cpp as rc

print("initializing")
arr1 = rt.FA([b'ABC'])
arr2 = rt.FA([123])
arrs = [arr1,arr2]

print("setting arr1 name")
arr1.set_name('NameABC')
print(f"arr1.name={arr1.get_name()}")

print("calling rc.MultiKeyGroupBy32")
iKey, iFirstKey, unique_count = rc.MultiKeyGroupBy32(arrs, 0, None, 2, None)
print(f"key.name={iKey.get_name()}")

It emits

initializing
setting arr1 name
arr1.name=NameABC
calling rc.MultiKeyGroupBy32
key.name=NameABC

so the resulting key array has the same name as the first input, which is used in the new-from-template.

In v1.1.0, the code emits

initializing
setting arr1 name
arr1.name=NameABC
calling rc.MultiKeyGroupBy32
key.name=None