Closed DavisVaughan closed 5 years ago
That is a good idea. I was struggling with the same problem before.
I think it should definitely be possible.
We struggle with this in Rcpp land too. The trouble is that we are smushing the lines between compile and run-time. C++ wants a compile time check, but R will only tell us at run-time what the type is. Hence the need for SEXP
in interfaces, and dispatch inside the function.
In short, it looks like @DavisVaughan re-invented what we have explained eg in this somewhat classic piece from 2013 by Kevin at the Rcpp Gallery (which is referenced in eg a number of StackOverflow and mailing list answers).
I fear we can't do much better than this.
Hi Dirk, great to see you in this issue -- I was about to ping you for ideas :)
I think we can do a bit better:
double
, int(32)
, bool
or complex
as I understand those are the only data types in R (of interest here). E.g. uint64
and others should all be forbiddenIf I use the TYPEOF
macro on a SEXP do I get the element type of the R vector/matrix?
I am hopeful we can do better :) So far we haven't.
Type restrictions are fine. The list is what R has. (For int64_t
we cheat via "overloaded" interpretation, nothing has been written for uint64_t
:-( ).
pybind11
is on my list of things to look at, but as I hardly use Python that list only gets push_back
and no pop_front
:-( So not sure how it different,
Yup. TYPEOF
is your friend. Should be in a number of Rcpp Gallery posts, StackOverflow answers and of course the Rcpp sources. It is an R macros as R started this business with the "union
-alike" SEXP
.
A bit offtopic.
For int64_t we cheat via "overloaded" interpretation
For 32 bit float
there is a float pkg which takes same "cheating" approach - use R's integer vectors as storage. I use float pkg with Armadillo
"mapped" matrices without any issue. So I believe we can have interoperability of the float
and xtensor
.
@dselivanov interesting. This approach would make the proposed type checking completely useless, right? Because we couldn't detect a "float" vector (which is actually a int32 vector to R)?
I think for these advanced R usage we're in need of advanced R people as the xtensor team! It would be awesome if you could help out @dselivanov :)
No, no. It's a side issue, just like my mention of the integer64
hack.
Fact: R has int32, double, complex, bool. All support NA (and most support NaN) so bool is three valued just to mess with us :)
Fact: They all travel in/out as SEXP and you can use TYPEOF at run-time to inquire about payload.
Fact: Add-on packages (bit64 for integer, float for float) cheat by sticking 64bit ints into a double, and 32bit floats into an int32.
Fact: None of that helps with rarray
.
But mentioning these side-hacks shows different approaches in pending the rules a little.
I also like pybind11's dynamic dispatch approach, although it is not always ideal because of the overhead.
Maybe we could make a dynamic dispatch in xtensor-r for the data type?
Thanks @eddelbuettel for chiming in!
Happy to help, particularly as "talk is cheap" :)
Dynamic dispatch may be worth it, definitely for an exploration. I am not really sure if that has been tried (and I am not following Arrow all that closely so I am not sure what they do over there). If it works for pybind11, and as you already put the C++14 marker down (which will "eventually" be less of an issue as all compilers catch up) it may be a good route.
Hi @eddelbuettel & @DavisVaughan
I've implemented the safe guards in this PR: https://github.com/QuantStack/xtensor-r/pull/61/files
Do you guys want to quickly review the change? @eddelbuettel is this an appropriate way of raising an error to R? It seems to work fine!
Regarding what you've been mentioning above... I had a bit of a talk with a Pandas dev who's interested in Arrow, and he mentioned that R implements sentinel values, correct? Is that how NA values are represented for Ints, and bools? It could be quite cool to support R's way of creating NA values etc. natively using xtensor_optional_assembly and related tools!
I was additionally wondering how the character / string array works in R (with regards to memory layout). Is it a bunch of \0
terminated strings, or do they all have the same buffer length (as they do in NumPy). I wonder wether it would be trivial to wrap this data type in xtensor, or not ... :)
Cheers & thanks for the help!
That looks good to me, and yes, Rcpp::stop()
it is as we did a number of iterations on that over the years to get stacks unwounds etc pp. Should "Just Work" (TM).
The (super useful) NA
and NaN
definitions for types other double
are in the R headers.
Thanks for implementing this, these merged changes look good!
This has been a bug on my part that I struggled with for a few days now, and have just figured out.
I had a simple function that took in a
SEXP
that is automatically converted to ararray<double>
, and then just returns it. If I passed inc(1, 2, 3)
everything worked fine. If I passed in1:3
it gave garbage results likec(1e-314, 1e-314, 1e-314)
.This is a result of me being dumb and not remembering that
c(1, 2, 3)
is a numeric vector, and1:3
is an integer vector. So passing along the integer SEXP results in garbage when its converted torarray<double>
.An example of all of this is in the readme here (just look at the calls to
identity_cpp()
): https://github.com/DavisVaughan/xtensorfailureWith the identity function here: https://github.com/DavisVaughan/xtensorfailure/blob/master/src/example.cpp#L7
Would it be possible to check that the type
T
matches up with the type of the R object provided, and throw an error if not? I think you could probably do this in therarray
andrtensor
constructors, where you could maybe compareSXP
as you have defined it againstTYPEOF(SEXP_object_to_convert)
and throw an error if they are different?