are there workarounds to avoid copying function non-array arguments?

gdementen commented 1 year ago

Hello. Thanks for the great library.

I have code that looks like this:

#pythran export look_pythran(int[], int:int dict)
#pythran export look_pythran(int list, int:int dict)
def look_pythran(key, mapping):
    if isinstance(key, np.ndarray):
        return np.array([mapping[k] for k in key])
    else:
        return [mapping[k] for k in key]

Using small mappings, the performance is very nice, especially for large keys... but for large mappings, Pythran is much slower than the pure Python version. I suppose this is because Pythran copies the arguments (https://pythran.readthedocs.io/en/latest/MANUAL.html#limitations) but I haven't found a way to workaround this.

For example,

is there a way to prepare a "Pythranized" mapping once and either pass that as argument or via a global value, so that the mapping is copied only once instead of for every function call?
is there a way to pass mapping.__getitem__ as argument instead of the mapping itself (but I don't know how to express that type in Pythran)?
is there a way to mark the mapping as read-only so that Pythran can use it as-is instead of first copying it?

Thanks for any help

serge-sans-paille commented 1 year ago

Interesting question, sorry for not answering earlier. There's currently no straight-forward way, but it would be great to have one, indeed. Among the three solution you propose, the first one makes most sense to me. I wonder what would be the best syntax to support this... probably a dedicated type?

gdementen commented 1 year ago

Hmm. FWIW, I never answered this because I agree with you... But sometimes an explicit answer is better than an implicit one 😉.

IMO, the dedicated type is indeed the best you could offer, as it would make it possible to control when the conversion actually happens and/or when a conversion needs to happen again, whereas something like a "cached" modifier for the argument in the pythran function annotation (the only other option I see -- but I might be short-sighted on this) would only support caching when first-called. But maybe the "cached" argument modifier could lead to more optimizations being possible?? In that case, you could support both.

serge-sans-paille / pythran

are there workarounds to avoid copying function non-array arguments? #2012