oracle / fastr

A high-performance implementation of the R programming language, built on GraalVM.
Other
626 stars 64 forks source link

Installation of data.table 1.11.4 fails (but 1.10.4 succeeds) #16

Closed adamsardar closed 5 years ago

adamsardar commented 6 years ago

Hi,

On installing data.table for R within the precompiled linux binary of graalvm (graalvm-ce-1.0.0-rc2), I receive the following error (I've highlighted the important line):

install.packages("data.table") ... installing vignettes testing if installed package can be loaded Error: package or namespace load failed for ‘data.table’ in runHook(".onLoad", env, package.lib, package): .onLoad failed in loadNamespace() for 'data.table', details: call: fun(libname, pkgname) error: Unexpected base R behaviour: list(x) has copied x Error: loading failed ... installation of package ‘data.table’ had non-zero exit status

However, data.table version 1.10.4 installation does work:

install.packages("http://cran.r-project.org/src/contrib/Archive/data.table/data.table_1.10.4.tar.gz")

I notice that between 1.10.4 and 1.11.4 (current) there has been a requirement change in R from 3.0 to 3.1.

The crux of the issue is this section in .onLoad:

'# Test R behaviour that changed in v3.1 and is now depended on x = 1L:3L y = list(x) if (address(x) != address(y[[1L]])) stop("Unexpected base R behaviour: list(x) has copied x")

Does this point to an imlementation difference between GNU R and FastR?

Adam

p.s. thank you for all of your hard work with FastR - I am convinced that it will be a real boon to the community.

steve-s commented 6 years ago

Hi Adam,

thank you for reporting this. Coincidentaly, we've been recently looking into data.table and adding support for missing functionality in FastR that data.table uses, for example in [1]. This issue that you pointed out is indeed caused by implementation difference, in a longer term we may slightly rework our data representation to support this scenario. However, there are more fundamentals problems with data.table, so I don't think we can support that package "as is" right now or in near future unless the authors decide to modify it. We're also considering providing a patched version for FastR. For technical details please read on.

The fundamental problem with data.table is that it uses the DATAPTR macro to access internal data of vectors and copy them using e.g. memcpy. In FastR we can support this for atomic types, but it is suboptimal and if you do this to character vectors, it will confuse the Java GC. In fact, it's also not the recommended approach even for GNUR GC [2], but unlike us it can tolerate it. One possibility is to provide data.table patched for FastR. Another is that the ALTREP GNU R branch [3] gets merged and it motivates the data.table authors to use Dataptr_or_null before using DATAPTR and providing an alternative code path if using DATAPTR is not desired (i.e. Dataptr_or_null returns null).

The issue that you pointed out is interesting: data.table checks that list(x) does not copy x, which even in FastR it doesn't, however, to check that no copy was made data.table uses C function address. FastR has efficient representation of sequences, but we have to materialize this compact representation to full vector when it is passed to native code, hence the address function returns different values each time it gets called with the same compact sequence.

P.S.: thank you for the kind words and please feel free to share with us more about how you use FastR and which other packages are important for your scenario.

[1] https://github.com/oracle/fastr/commit/cbeedb464a39d7382888fd92b5dbcd82b2b3eb29 [2] https://cran.r-project.org/doc/manuals/r-release/R-ints.html#The-write-barrier [3] https://www.r-project.org/dsc/2017/slides/dsc2017.pdf

steve-s commented 5 years ago

There is now FastR specific version of data.table. It can be installed in FastR using:

install.fastr.packages('data.table')

I am closing this issue. If you encounter any issues with concrete data.table examples that don't work on FastR, please open new issues for those.