Python callback into BridgeStan with C types

roualdes commented 7 months ago

I have an implementation of Stan that iterates and does warmup calculations in Python and leapfrog/trajectories in C++. The gradient calculations implicit in each leapfrog step happen via a Python callback which then calls BridgeStan: Python -> C++ -> Python/BridgeStan -> C/C++/BridgeStan.

Since the C++ code is calling a BridgeStan function, and Python/BridgeStan expects numpy arrays, I've got to convert a double* to a numpy array just so that BridgeStan can convert the numpy array back to double*. The conversion to a numpy array is unnecessarily costly, especially since everything is using the same underlying memory.

Consider BridgeStan's python/bridgestan/model.py lines 219 to 229. This code chunk defines the Python/BridgeStan interface to the C/BridgeStan function bs_log_density_gradient. Lines 225 and 227 specify numpy arrays as argument types from Python/BridgeStan -> C/C++/BridgeStan.

Simply adding the following code to python/bridgestan/model.py::StanModel's __init__() method allows a Python callback from C++ to re-use the original double*:

self._ldg = self.stanlib["bs_log_density_gradient"]
self._ldg.restype = ctypes.c_int
self._ldg.argtypes = [
    ctypes.c_void_p,
    ctypes.c_int,
    ctypes.c_int,
    ctypes.POINTER(ctypes.c_double), # changed from double_array
    ctypes.POINTER(ctypes.c_double),
    ctypes.POINTER(ctypes.c_double), # changed from double_array
    star_star_char,
]

What do you think of building such functionality into BridgeStan? Just adding the above code won't break our API in anyway, but does offer a whole new world of possibility for BridgeStan. I'm not sold on the name _ldg, I'm just asking about the possibility of including such functionality.

If y'all are into this idea, would you brainstorm with me what a better design might look like that allow callbacks, from the various higher level languages, into BridgeStan using plain C types?

bob-carpenter commented 7 months ago

What do you think of building such functionality into BridgeStan?

I couldn't quite follow all the callbacks, so I'm not entirely sure what you are suggesting adding to BridgeStan and what that would enable for clients that they can't achieve now.

WardBrian commented 7 months ago

My understanding of Edward's request is that a user may be working with other code that uses ctypes, and as a result have a double array which is represented as a ctypes.POINTER(ctypes.c_double), rather than a numpy array object.

There is nothing at the C level of BridgeStan to differentiate between these, and in principle either is fine, but for convenience BridgeStan only speaks the language of numpy ndarrays, not 'raw' ctypes pointers. You can convert from a ctypes pointer to an ndarray, but since all BridgeStan really does is some bounds checking and then convert it back to a pointer internally, this comes with an extra cost at no extra benefit.

The concrete code Ed provided would give you a different function which accepts ctypes objects as opposed to numpy ones, but I also have a minimal patch prepared which would make it so either can be accepted anywhere we currently accept numpy arrays, meaning no extra functions, just slightly broader types being accepted.

roualdes commented 7 months ago

I couldn't quite follow all the callbacks

Ya, sorry about that. I'm struggling to explain this fairly complicated process. Thankfully, Brian is experienced at filling in the gaps between what Edward says and what Edward means.

Skipping the details, there is nothing clients can't achieve now. There is opportunity, as I see it, to enable calling BridgeStan both more simply and faster in a very specific use case.

bob-carpenter commented 7 months ago

Got it. Thanks!

WardBrian commented 7 months ago

I'm not sure if this is an issue outside of Python, since it really stems from numpy being ubiquitous but not actually in the standard library.

That said, the following snippet of code only uses our public API, so there is nothing actually requiring this to live inside BridgeStan to unlock the same functionality:

model = bridgestan.StanModel(...)
ctypes_log_density_gradient = model.stanlib["bs_log_density_gradient"]
# set up metadata as Edward does above, using ctypes.POINTER(ctypes.c_double)

The only thing you really lose here is the Python wrappings around the exceptions, you'd need to manually check the return codes etc. OTOH, if you're writing something that calls this from inside other compiled code, then that may actually be how you'd like things to be set up

roualdes commented 7 months ago

Right, this doesn't seem to be an issue outside of Python, for the reasons you mention. Good call.

Brian proposed some code, in branch python/allow-ctypes-double-pointers, that allows direct access to BridgeStan's log_density_gradient() for both numpy arrays and ctypes. The solution involves an extra if statement, relative to what lives in main branch. To help us determine whether or not we want to include such functionality in BridgeStan, I ran some experiments and report the results below.

I ran some numerical simulations to better understand the computational cost trade-offs associated with this extra if statement. From a numpy array user perspective, we expect a slight slow down (one extra if statement). From a ctypes user perspective, we expect a major gain (no unnecessary numpy array creation). The code used for the simulations lives in the github repository bridgestan-speed. The simulations were across three models: gaussian with data (2 parameters), logistic regression (25 parameters), and standard normal with no data (1_000 parameters).

Numpy array perspective. 500 runs of 1_000 evaluations of log_density_gradient across two branches. Times reported are means per run plus/minus standard deviations.

dimensions / branch             main                    python/allow-ctypes-double-pointers
2 (normal w/ data)              13.9 ms +/- 40.9 μs     14.6 ms +/- 493 μs
25 (logistic regression)        96.2 ms +/- 8.42 ms     82.7 ms +/- 1.13 ms
1_000 (std_normal no data)      27.5 ms +/- 1.48 ms     27.1 ms +/- 721 μs

Ctypes perspective. 20 runs of Stan sampling with 1_000 warmup and 1_000 iterations across two branches. Times reported are means per Stan run plus/minus standard deviations. RNG seeds were coordinated to ensure no branch lucked into a worse part of the model parameter space than the other branch.


dimensions / branch             main                    python/allow-ctypes-double-pointers
2 (normal w/ data)              251 ms +/- 7.36 ms      89.9 ms +/- 5.3 ms
25 (logistic regression)        2.56 s +/- 132 ms       2 s +/- 74.9 ms
1_000 (std_normal no data)      1.51 s +/- 30.3 ms      908 ms +/- 92 ms

What do you all think, should we add this functionality to BridgeStan?

bob-carpenter commented 7 months ago

Thanks for the report @roualdes. I couldn't quite tell what was being compared. I see two tables of results, but don't know what they mean. What does "across two branches" mean for the "Numpy array perspective"? Is it that the first one is passing in an numpy array and the second passing in some raw memory?

roualdes commented 7 months ago

What does "across two branches" mean for the "Numpy array perspective"?

The first table is meant to show costs associated with the proposed solution to this issue for the most common Python/BridgeStan user, a user who is only dealing with numpy arrays. The table presents mean run times for calling BridgeStan's log_density_gradient with numpy arrays. The first column sets the baseline for run times in the branch main. The second column displays run times for the proposed solution from branch python/allow-ctypes-double-pointers.

Is it that the first one is passing in an numpy array and the second passing in some raw memory?

Correct. The second table is meant to show benefits associated with the proposed solution when a user has a double* to raw memory.

Let me know if I can offer any other words to help clarify what my results are about.

bob-carpenter commented 7 months ago

Thanks---that's what I suspected and this looks close enough to be worth doing.

If the results weren't clear enough here, I'd be tempted to do about 10 times as many evals over several batches because the inconsistent sd in the tests you list (e.g., "13.9 ms +/- 40.9 μs 14.6 ms +/- 493 μs" in the first line of the first table, where the new method has ten times the standard deviation). This kind of micro-benchmarking is notoriously tricky, so probably not worth it.

WardBrian commented 7 months ago

They align with my expectation - for numpy arrays it is a wash and for raw pointers the overhead is much less.

If it had ended up being a significant increase for numpy arrays I would be more hesitant toward adding it. However, it's really just adding one more if into a part of the code that already has a bunch of ifs for validation logic, so I'm not shocked the impact is essentially nothing.

roualdes / bridgestan

Python callback into BridgeStan with C types #190