inducer / islpy

Python wrapper for isl, an integer set library
http://pypi.python.org/pypi/islpy
73 stars 19 forks source link

native upcasts #113

Closed matthiasdiener closed 1 year ago

matthiasdiener commented 1 year ago

Is this the approach you had in mind @inducer? (the actual code to create the upcasts would be added to gen_wrap.py)

1. Substitute for make_new_upcast_wrapper:

Option a)
  wrap_basic_set.def("foreach_point", [](isl::basic_set &self, py::object fn) 
                     { return isl::set_foreach_point(self, fn); });
Option b)
  wrap_basic_set.def("foreach_point", isl::set_foreach_point);

(I think the actual cast is handled by impicitly_convertible)

2. Substitute for make_existing_upcast_wrapper:

Option a)
wrap_basic_set.def("union", [](isl::basic_set &self, isl::union_set &arg_set2)
                                { return isl::union_set_union(self, arg_set2); });
Option b)
wrap_basic_set.def("union", isl::union_set_union);

Edit: I implemented option b) for both cases.

inducer commented 1 year ago

Yes, something like that would work.

matthiasdiener commented 1 year ago

When built against nanobind (+ #116), this provides a modest speedup for the microbenchmark on M1:

inducer commented 1 year ago

I'm OK with option b) in both cases. I'm not sure option a) would work unmodified. You would likely need to call the casting function manually somewhere in there in order to get it to work, and this may be incrementally faster than option b). But a lot depends on exactly how much faster, and whether that justifies the extra effort (and generated code).

At the same time, there's a small amount of speed-up here, and we're net deleting code: I'd call it a win regardless and would be happy to merge this, I think.

matthiasdiener commented 1 year ago

I'm OK with option b) in both cases. I'm not sure option a) would work unmodified. You would likely need to call the casting function manually somewhere in there in order to get it to work, and this may be incrementally faster than option b). But a lot depends on exactly how much faster, and whether that justifies the extra effort (and generated code).

Option a) works in the way it is written above (possibly also due to making use of implicitly_convertible?). Both options are very close performance-wise in my test.

inducer commented 1 year ago

Thanks!