r-lib / cpp11

cpp11 helps you to interact with R objects using C++ code.
https://cpp11.r-lib.org/
Other
199 stars 46 forks source link

external_ptr issues #296

Open fgp opened 1 year ago

fgp commented 1 year ago

In a recent project I had some issues with external_pointer that I wanted to discuss. First, a bit of context: My goal was to make a simple C++ library consisting of a bunch of classes that do stochastic simulations accessible from R. My approach was pretty straight-forward -- for every class T I created a C++ function that calls the constructor returns an external_pointer<T>, and for every member function I created a cpp11 function that takes an external_pointer<T> as its first argument.

The issues I faced were the following:

  1. No type safety. On the R side, there is no distinction between external_pointer<T> and external_pointer<U>, so passing the wrong type of object to a cpp11 function is possible and will lead to crashes.

  2. No support for protected data attached to the external pointer. This is relevant for C++ classes that keep references to instances of other wrapped classes around. For these references to stay valid, the referenced objects must live at least until the external pointer owning the referencing object is destructed. This is exactly what external pointers guarantee for the protected data SEXP they support, but the current external_pointer implementation does not make this functionality accessible.

  3. While the semantics of R's external pointer are clear to me (I think), I didn't quite understand the semantics of cpp11's external_pointer class. I would have expected the class to have essentially the same semantics as sexp for copies and moves, i.e. basically those just copy the underlying SEXP around, but take care to protect it from garbage collection. However, it seems that external_pointer instead considers itself to own a particular external pointer SEXP, and copies it (using Rf_shallow_copy) when external_pointer is copied. What is the rational there?

  4. Probably due to (3) my attempts at wrapping a class whose instances keep references to other objects around lead to crashes. Basically, what I did was to let the external_pointer point not to the class directly, but to a holder struct that contains external_pointer instances for the object plus all the objects it references. Whether the problem was my code or external_pointer I couldn't tell because I didn't really understand the intended proper usage of external_pointer in this case.

I ended up creating my own external pointer class, safe_external_pointer (code is attached). I used external_pointer as a template, but changed the following things

  1. The tag mechanism of R's external pointers is used to provide rudimentary type safety.

  2. The protected data mechanism is exposed to the user

  3. The underlying external pointer SEXP is considered to be immutable, i.e. never changed after its creation. The copy and move semantics of safe_external_pointer are the same as that of sexp. Thus, safe_external_pointer does not really "own" the underlying external pointer SEXP, but rather is one of many references to it.

For my use case, this simplified things a lot, and it got rid of the crashes I wasn't able to debug before. Whether other use-cases would be made more complicated by these changes (in particular (3)), I don't know of course. Still, I figured I'd share my code and see if there's any interest in either including parts of it into cpp11's external_pointer, or including it as an alternative external pointer wrapper alongside the existing one. safe_external_ptr.txt

thk686 commented 1 year ago

I doubt the boost dependency will fly but who knows. In my recent project, I pushed the type safety to the R layer using R6 and that works quite well.

fgp commented 1 year ago

I doubt the boost dependency will fly but who knows.

Oh, that was just an oversight. boost isn't really required, I just forgot to remove it before attaching the file. The only place where boost is used is to demangle the name reported by typeid(). But that isn't really necessary, and in fact typeid() (and thus RTTI) could be done away with entirely, and be replaced by the address of a static member as the unique tag of a type.

In my recent project, I pushed the type safety to the R layer using R6 and that works quite well.

I've done that in the past, but have to admit I was never quite satisfied with the result. In particular when it came to documentation, the R help system doesn't seem to be very well geared for documentation R6 member functions. But yeah, it nicely deals with the type safety issue, that's true.