Closed ChrisMuir closed 6 years ago
An alternative (and maybe better solution) could be to create header files in inst/include
that provide access to the stringdist
C functions, and would allow for direct integration by another package using LinkingTo
in the DESCRIPTION
file of the other package. This would give package authors access to the stringdist
C functions without exporting any additional R functions. Taking this approach, I could use the C function R_lower_tri()
directly, and write my own wrapper for use in my pkg functions.
Check out this SO post, and the answers by Romain and Dirk for more info.
And per Dirk's answer in that thread, here's an example in the wild, rcppxts linking to and using C functions from the xts pkg.
If you're open to this idea, I'd be happy to work on implementation. Let me know if there's any interest. Thanks!
Yeah exposing the C API is definitely on my list.
Cool! I can start working on implenenting changes next week, and can submit a PR. Let me know if you're okay with that, or if you have any thoughts/issues/concerns. Thanks Mark!
That would be great! Couple of things/ideas/pointers
stringdist
does not depend on RCpp, by choice. I like to keep it as lightweight as possible. C++
.Using C99 and avoiding C++ may seem a bit spartan to some but my philosophy is that we're writing a library, and not an application. For a library, stability, reliability and portability are the most important features, so investing in clean and simple code with no dependencies is worth the effort.
R_[funcname]
that are called from R using .Call
. They accept SEXP
arguments.C
code which in principle works without R. They work on standard C
types. (one plan is to expose the core algorithms through other languages like Python, or to make it a GNU library... nothing of this has become remotely close to being reality though). iconv
C API works. You first construct an object instance geared to a certain distance type (R_open_stringdist
). Amongst other things, this also makes sure the right amount of memory gets claimed. Next, the object is used to compute distances, afterwards you need to close it.I also have a doubt:
R_
functions or some of the lower-level C
functions. Do we expose an API that accepts SEXP
objects or one that accepts C
types, like char *
?I assume that many users of linkingto
will be Rcpp users so we could gear it towards that kind of use I suppose (while avoiding C++
ourselves)
Hi Mark,
Thanks for the follow up and feedback. I actually worked on this some over the weekend, and last night pushed an initial first pass at making this work, check out the diffs here. I also created a new branch in my package refinr that is LinkingTo stringdist
and is using two of your C functions. Couple of things to mention (and keep in mind, this was just meant to be a rough proof concept to get the ball rolling):
stringdist
after package xts, so there's no C++ code involved, no dependency on Rcpp
, and no new dependencies added.R_
C functions.refinr
, since I'm no longer directly calling any of the stringdist
R functions, I was able to remove stringdist
from Imports
in my DESCRIPTION
file (but added it to LinkingTo
).@importFrom stringdist stringdistmatrix
in my pkg man docs. Once I removed that, everything broke. However, after adding stringdist
back to the Imports
field it still didn't work (throwing error Error in lower_tri(x, method, weight, p, bt, q, useBytes, nthread) : function 'R_lower_tri' not provided by package 'stringdist'
), so I added @importFrom stringdist stringdist
back to my pkg man docs, and now everything works again. It's feels like the same issue with Rcpp
where adding it to Imports
and LinkingTo
isn't enough and you're required to import a specific, random function in order to make it "click". I'll keep working on this.stringdist
can call stringdist
C functions within its own C functions (see the xts
example package). In my package refinr
, I'm able to do that, but I can also call stringdist
C functions within R using .Call()
. I think this is only because both of the stringdist
functions I'm using are getting picked up by RcppExports.cpp
, but I'm not sure about that.To briefly map out how refinr
is using the stringdist
C API, see the last two functions in my utils.R file. Func lower_tri()
is calling sd_lower_tri()
. Func get_list_lengths()
is calling a C function I added to refinr
in file refinr/src/stringdist.c
, which is calling sd_lengths()
. Both of the sd_
functions are pointing back to your C functions (the R_
versions) via file stringdist/pkg/inst/include/stringdist_api.h
. I'm not using get_list_lengths()
in my pkg, but added the function just to test, and it's working:
refinr:::get_list_lengths(list(c(1, 2, 3), c("cats", "dogs")))
#> [1] 3 2
I worked on these edits on a PC, and everything works so far. I plan to test it all on a Mac today. I also set up my forked stringdist
repo on Travis, and the c_api
branch passed, so that's a good sign.
Quick update, just tested on a Mac, installing the c_api
branch of both ChrisMuir/stringdist/pkg
and refinr
works fine, and the edited functions in refinr
work as expected.
Also, regarding the whole Imports
issue for refinr
, if I include @import stringdist
in my pkg man docs, I can then remove stringdist
from the Imports
field of the DESCRIPTION
file....here's the commit with those edits. I'm not really sure if it's best to set it up this way, or to leave stringdist
on the Imports
list.
I think we can close this issue now.....C API has been built out and is exposed as of version 0.9.5.1.
Have you given any thought to exposing some of the internal functions that do the heavy lifting? I ask because I maintain a package that uses
stringdistmatrix()
like so:Obj
initial_clust
is a list of char vectors of various lengths, and when input is large this code chunk becomes a bottleneck. Just for testing purposes, I tried swapping instringdist:::lower_tri()
and saw a 23% speed boost, and I don't need to push all of the inputs through all of the validations at the top ofstringdistmatrix()
every time it's called.I can't use
:::
to importlower_tri()
because CRAN will complain. If it were just a single C function I could simply copy the function (with credit to the authors obv), but I'm currently giving users the option to choose any of the ten methods via ellipses.I'd love to continue importing
stringdist
and be able to uselower_tri()
.