r-lib / cpp11

cpp11 helps you to interact with R objects using C++ code.
https://cpp11.r-lib.org/
Other
193 stars 46 forks source link

Issues with cpp11 0.4.4 changes to the preserve list structure #330

Closed DavisVaughan closed 11 months ago

DavisVaughan commented 11 months ago

The following will error if arrow is compiled with cpp11 0.4.4

conn <- duckdb::dbConnect(duckdb::duckdb())

path <- gbifdb::gbif_example_data()
dataset <- arrow::open_dataset(path)

duckdb::dbDisconnect(conn, shutdown = TRUE)

# Now restart R

It should error on restart with:

Error: bad value

This error is related to SETCAR() calls, which we confirmed with an lldb backtrace:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00000001014589ec libR.dylib`Rf_error(format="bad value") at errors.c:961:26 [opt]
    frame #1: 0x00000001014bded4 libR.dylib`SETCAR(x=0x000000013000a0e0, y=0x00000001309d81c8) at memory.c:4285:2 [opt]
    frame #2: 0x0000000123188814 arrow.so`std::__1::__shared_ptr_emplace<cpp11::environment, std::__1::allocator<cpp11::environment> >::__on_zero_shared() [inlined] cpp11::$_0::release(this=<unavailable>, cell=<unavailable>) at protect.hpp:388:5 [opt]
    frame #3: 0x00000001231887d0 arrow.so`std::__1::__shared_ptr_emplace<cpp11::environment, std::__1::allocator<cpp11::environment> >::__on_zero_shared() [inlined] cpp11::sexp::~sexp(this=<unavailable>) at sexp.hpp:58:23 [opt]
    frame #4: 0x00000001231887cc arrow.so`std::__1::__shared_ptr_emplace<cpp11::environment, std::__1::allocator<cpp11::environment> >::__on_zero_shared() [inlined] cpp11::sexp::~sexp(this=<unavailable>) at sexp.hpp:58:11 [opt]
    frame #5: 0x00000001231887cc arrow.so`std::__1::__shared_ptr_emplace<cpp11::environment, std::__1::allocator<cpp11::environment> >::__on_zero_shared() [inlined] cpp11::environment::~environment(this=<unavailable>) at environment.hpp:21:7 [opt]
    frame #6: 0x00000001231887cc arrow.so`std::__1::__shared_ptr_emplace<cpp11::environment, std::__1::allocator<cpp11::environment> >::__on_zero_shared() [inlined] cpp11::environment::~environment(this=<unavailable>) at environment.hpp:21:7 [opt]
    frame #7: 0x00000001231887cc arrow.so`std::__1::__shared_ptr_emplace<cpp11::environment, std::__1::allocator<cpp11::environment> >::__on_zero_shared(this=<unavailable>) at memory:3318:23 [opt]
    frame #8: 0x0000000123183c2c arrow.so`RExtensionType::~RExtensionType() [inlined] std::__1::__shared_count::__release_shared(this=0x00006000001faac0) at memory:3169:9 [opt]
    frame #9: 0x0000000123183c1c arrow.so`RExtensionType::~RExtensionType() [inlined] std::__1::__shared_weak_count::__release_shared(this=0x00006000001faac0) at memory:3211:27 [opt]
    frame #10: 0x0000000123183c1c arrow.so`RExtensionType::~RExtensionType() [inlined] std::__1::shared_ptr<cpp11::environment>::~shared_ptr(this=0x000060000388c830) at memory:3884:19 [opt]
    frame #11: 0x0000000123183c1c arrow.so`RExtensionType::~RExtensionType() [inlined] std::__1::shared_ptr<cpp11::environment>::~shared_ptr(this=0x000060000388c830) at memory:3882:1 [opt]
    frame #12: 0x0000000123183c1c arrow.so`RExtensionType::~RExtensionType(this=0x000060000388c790) at extension.h:33:7 [opt]
    frame #13: 0x0000000123182e8c arrow.so`RExtensionType::~RExtensionType() [inlined] RExtensionType::~RExtensionType(this=<unavailable>) at extension.h:33:7 [opt]
    frame #14: 0x0000000123182e88 arrow.so`RExtensionType::~RExtensionType(this=<unavailable>) at extension.h:33:7 [opt]
    frame #15: 0x0000000123224ea8 arrow.so`std::__1::shared_ptr<parquet::internal::RecordReader>::~shared_ptr() + 56
    frame #16: 0x0000000123507510 arrow.so`void std::__1::allocator_traits<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<arrow::ExtensionType> >, void*> > >::destroy<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::shared_ptr<arrow::ExtensionType> > >(std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<arrow::ExtensionType> >, void*> >&, std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::shared_ptr<arrow::ExtensionType> >*) + 24
    frame #17: 0x00000001235074dc arrow.so`std::__1::__hash_table<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<arrow::ExtensionType> >, std::__1::__unordered_map_hasher<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<arrow::ExtensionType> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::__unordered_map_equal<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<arrow::ExtensionType> >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<arrow::ExtensionType> > > >::__deallocate_node(std::__1::__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<arrow::ExtensionType> >, void*>*>*) + 32
    frame #18: 0x000000012350749c arrow.so`std::__1::__hash_table<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<arrow::ExtensionType> >, std::__1::__unordered_map_hasher<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<arrow::ExtensionType> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::__unordered_map_equal<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<arrow::ExtensionType> >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<arrow::ExtensionType> > > >::~__hash_table() + 24
    frame #19: 0x00000001235071d4 arrow.so`arrow::ExtensionTypeRegistryImpl::~ExtensionTypeRegistryImpl() + 44
    frame #20: 0x0000000123224ea8 arrow.so`std::__1::shared_ptr<parquet::internal::RecordReader>::~shared_ptr() + 56
    frame #21: 0x00000001a60fdec4 libsystem_c.dylib`__cxa_finalize_ranges + 476
    frame #22: 0x00000001a60fdc4c libsystem_c.dylib`exit + 44
    frame #23: 0x00000001015980b8 libR.dylib`Rstd_CleanUp(saveact=SA_NOSAVE, status=0, runLast=<unavailable>) at sys-std.c:1246:5 [opt]
    frame #24: 0x000000010159ae60 libR.dylib`R_CleanUp(saveact=<unavailable>, status=0, runLast=<unavailable>) at system.c:87:5 [opt]
    frame #25: 0x00000001014b539c libR.dylib`do_quit(call=<unavailable>, op=<unavailable>, args=0x0000000112b84bd8, rho=<unavailable>) at main.c:1487:5 [opt]
    frame #26: 0x0000000101466a24 libR.dylib`bcEval(body=0x0000000112b84f58, rho=<unavailable>, useCache=<unavailable>) at eval.c:7446:14 [opt]
    frame #27: 0x000000010145f048 libR.dylib`Rf_eval(e=0x0000000112b84f58, rho=0x0000000112b84cb8) at eval.c:1013:8 [opt]
    frame #28: 0x000000010147bccc libR.dylib`R_execClosure(call=0x0000000112b81400, newrho=0x0000000112b84cb8, sysparent=<unavailable>, rho=<unavailable>, arglist=<unavailable>, op=0x0000000112b850a8) at eval.c:0 [opt]
    frame #29: 0x000000010147a54c libR.dylib`Rf_applyClosure(call=0x0000000112b81400, op=0x0000000112b850a8, arglist=0x000000013000a0e0, rho=0x0000000130042f88, suppliedvars=<unavailable>) at eval.c:2113:16 [opt]
    frame #30: 0x000000010145f31c libR.dylib`Rf_eval(e=0x0000000112b81400, rho=0x0000000130042f88) at eval.c:1140:12 [opt]
    frame #31: 0x00000001014b33b4 libR.dylib`Rf_ReplIteration(rho=0x0000000130042f88, savestack=<unavailable>, browselevel=<unavailable>, state=0x000000016ee85b10) at main.c:262:2 [opt]
    frame #32: 0x00000001014b4928 libR.dylib`R_ReplConsole(rho=0x0000000130042f88, savestack=0, browselevel=0) at main.c:314:11 [opt]
    frame #33: 0x00000001014b4864 libR.dylib`run_Rmainloop at main.c:1200:5 [opt]
    frame #34: 0x00000001014b49d0 libR.dylib`Rf_mainloop at main.c:1207:5 [opt]
    frame #35: 0x0000000100f7bea0 R`main + 32
    frame #36: 0x00000001a5ee7f28 dyld`start + 2236

That error is due to the changes in this PR https://github.com/r-lib/cpp11/pull/285, which slightly tweaked the structure of the preserve list.

duckdb vendors cpp11 0.4.2, which uses the "old" structure of the preserve list. It gets loaded first (and presumably something internally calls cpp11) so the preserve list is initialized to the "old" structure. Then we load arrow and that internally uses cpp11 0.4.4 which included that PR. Since the structures differ, the insertion/release calls seem to not be aligned quite right, so as R exits and the destructors for the arrow objects are all run, we end up with an error in SETCAR() which seems to come from the release() call in ~sexp().

The PR included an attempt to be backwards compatible with old versions of cpp11, but it seems like it wasn't quite right (and is quite hard to get right, as we see here).


We have had many issues with "shared cpp11 state" over the years, and the fact that a package built with, say, cpp11 0.4.2, has to share state with a package built with cpp11 0.4.4 makes it very hard to introduce changes to the shared state objects in a backwards compatible way.

In #327 we faced a related issue, and our solution was actually to remove that shared state altogether.

Lionel, Kevin, and I have come up with a proposal to do the same thing here. Rather than having 1 "global" preserve list that is shared across multiple package and across compilation units within the same package, we propose that we go back to the original implementation of the preserve list (before it was changed here https://github.com/r-lib/cpp11/pull/92), which creates one preserve list per compilation unit. The crux of the implementation would look like

  static SEXP get_preserve_list() {
    static SEXP out = R_NilValue;
    if (out == R_NilValue) {
      out = new_preserve_list();
      R_PreserveObject(out);
    }
    return out;
  }

We think this should get rid of the shared state problem altogether:

Two very minor downsides:


Some other related issues: