Closed adamsardar closed 5 years ago
Hi Adam,
thank you for reporting this. Coincidentaly, we've been recently looking into data.table and adding support for missing functionality in FastR that data.table uses, for example in [1]. This issue that you pointed out is indeed caused by implementation difference, in a longer term we may slightly rework our data representation to support this scenario. However, there are more fundamentals problems with data.table, so I don't think we can support that package "as is" right now or in near future unless the authors decide to modify it. We're also considering providing a patched version for FastR. For technical details please read on.
The fundamental problem with data.table is that it uses the DATAPTR
macro to access internal data of vectors and copy them using e.g. memcpy
. In FastR we can support this for atomic types, but it is suboptimal and if you do this to character vectors, it will confuse the Java GC. In fact, it's also not the recommended approach even for GNUR GC [2], but unlike us it can tolerate it. One possibility is to provide data.table patched for FastR. Another is that the ALTREP GNU R branch [3] gets merged and it motivates the data.table authors to use Dataptr_or_null
before using DATAPTR
and providing an alternative code path if using DATAPTR
is not desired (i.e. Dataptr_or_null
returns null
).
The issue that you pointed out is interesting: data.table checks that list(x)
does not copy x
, which even in FastR it doesn't, however, to check that no copy was made data.table uses C function address
. FastR has efficient representation of sequences, but we have to materialize this compact representation to full vector when it is passed to native code, hence the address
function returns different values each time it gets called with the same compact sequence.
P.S.: thank you for the kind words and please feel free to share with us more about how you use FastR and which other packages are important for your scenario.
[1] https://github.com/oracle/fastr/commit/cbeedb464a39d7382888fd92b5dbcd82b2b3eb29 [2] https://cran.r-project.org/doc/manuals/r-release/R-ints.html#The-write-barrier [3] https://www.r-project.org/dsc/2017/slides/dsc2017.pdf
There is now FastR specific version of data.table
. It can be installed in FastR using:
install.fastr.packages('data.table')
I am closing this issue. If you encounter any issues with concrete data.table
examples that don't work on FastR, please open new issues for those.
Hi,
On installing data.table for R within the precompiled linux binary of graalvm (graalvm-ce-1.0.0-rc2), I receive the following error (I've highlighted the important line):
However, data.table version 1.10.4 installation does work:
I notice that between 1.10.4 and 1.11.4 (current) there has been a requirement change in R from 3.0 to 3.1.
The crux of the issue is this section in .onLoad:
'# Test R behaviour that changed in v3.1 and is now depended on x = 1L:3L y = list(x) if (address(x) != address(y[[1L]])) stop("Unexpected base R behaviour: list(x) has copied x")
Does this point to an imlementation difference between GNU R and FastR?
Adam
p.s. thank you for all of your hard work with FastR - I am convinced that it will be a real boon to the community.