In our Numpy/HPy port we've seen that calling ufunc objects is significantly slower because we don't support the vectorcall protocol. Although HPy's VARARGS calling convention is very similar to the vectorcall calling convention on the receiver side, there is still a big difference on the caller side. In particular, CPython will internally still allocate tuples for arguments and dicts for keyword arguments etc.
Besides that, PEP 590 describes some more shortcomings of the tp_call calling convention.
In our Numpy/HPy port we've seen that calling ufunc objects is significantly slower because we don't support the vectorcall protocol. Although HPy's
VARARGS
calling convention is very similar to the vectorcall calling convention on the receiver side, there is still a big difference on the caller side. In particular, CPython will internally still allocate tuples for arguments and dicts for keyword arguments etc.Besides that, PEP 590 describes some more shortcomings of the
tp_call
calling convention.