Open waynelapierre opened 5 years ago
The R Core group shows no interest in such things. It might actually be easier technically to think in terms of merging the useful features of R Core releases into pqR.
I have a mixed feeling if pqR should be more like newest R or to go its own way. I am an advocate of transferring what is really interesting at R Core to pqR, but more less than more more. However, I strongly support the refactoring of the whole C code (I will be happy to get involved). The current coding style is very illegible and therefore prone to errors. I think that focusing on quality is better than focusing on the number of functionalities.
The main benefit of merging R Core features into pqR is that recent versions of many packages depend on recent R Core features. Many such features have been merged into pqR (or supported in a different way), but nowhere near all of them.
A lot of the new R Core features (or incompatible changes) are not very profound, but of course not having them in pqR still stops packages from working. There are some more useful changes, one being better support for https, which in pqR is supported only for Windows (as was the situation for R-2.15.1).
The big addition in R-3.0.0 was support for long vectors (with 2^31 or more elements). pqR doesn't support long vectors, although it does support (in a trivial way) the related API functions such as XLENGTH. That means that there is no package incompatibility problem - only problems if you actually need long vectors.
As far as implementation aspects, I can think of the following major changes in R Core versions after the fork of pqR:
Long vectors, as discussed above.
Improved bytecode compiler and JIT support. In pqR, the regular interpreter is now comparably fast, so this is not needed.
A new reference counting scheme, which seems to still not be enabled in the forthcoming R-3.6.0, presumably due to problems of some sort.
The new ALTREP framework. We'll have to see how this works out. The way it is implemented, there can be non-negligible performance degradation from this features, which has to be balanced against the cases where it improves performance. There may end up being a rather time-wasting push for changes to C code in numerous packages, to ameliorate the performance degradation. The ALTREP feature could be incorporated into pqR (with a fair amount of work, but as far as I can see no fundamental difficulty), but in pqR many of the performance benefits are obtained or could be obtained in other ways, with lower overhead.
The major new features of pqR include the following:
Numerous detailed performance improvements, as well as better methods for central operations such as symbol lookup.
Support for deferred evaluation, used for automatically parallelizing numerical computations, and for merging operations (e.g., so that 2*v+1 is done with one loop over elements of v, not two loops).
A "variant result" mechanism that is used internally to support things like making v <- v+1 be an increment operation (not allocating a new vector), and all(v>0) not allocating space for v>0 and stopping early when a negative element of v is found.
Fast matrix multiplication using SIMD instructions and multiple cores, while retaining the same roundoff behaviour as the naive R method.
A much faster garbage collector, which also leads to less memory being used for objects.
A cleaner and faster parser.
New language features that fix design flaws in the R language, and that provide programming conveniences.
Automatic differentiation, in the development version I'm now working on.
From these lists, it seems like merging R Core features into pqR may make more sense than merging pqR into R Core. Possibly a split can be made between the core interpreter, where most pqR changes are, and peripheral areas like the base 'stats' package, with different merge directions for the centre and periphery.
Working on the code in the R interpreter can indeed be frustrating, due to the inconsistent and often not-very-good coding style. However, the poor quality - especially the poor quality of the API documentation - is itself a barrier to refactoring. Distant parts of the interpreter and user-written packages depend on details that one might have thought they shouldn't depend on. My approach has been to undertake substantial rewrites of some major modules - notably the garbage collector and the parser - but otherwise to make only minimal quality improvements while making other changes.
The CXXR project (later taken over by Google, renamed rho, and then dropped - see https://github.com/rho-devel/rho) aimed to undertake such a refactoring. But I think code quality alone is not a sufficient motivation for the level of effort required. Quality can also be addressed by better testing, which is one thing you could work on if you like.
Another relatively separate project that you'd might be interested in would be to incorporate the improvements to https support (and network connections generally) that came with R-3.0.0 and later versions. This is important for future viability. It's also important to merge changes in how vignettes are supported, and other package installation changes, since these impact package compatibility. For both of these areas, I haven't looked at all closely at the issues, so I don't know if these are fairly easy changes, or come with non-negligible difficulties.
There are also more "interesting" (i.e., not-so-easy) things to work on like figuring out how to automatically adjust the heuristics for when to do operations in parallel on multiple cores in order to take account of the characteristics of the processor being used.
Of course, what would be good to work on would depend on your interests and level of experience in various areas. I'd be happy to discuss this further.
Let's start with who pqR is an attractive choice for. In my opinion, it will be an advanced user who knows his own expectations. New users and typical R users will prefer R Core. The argument that pqR uses less memory does not make sense if someone works with a heavy RStudio. Arguments that pqR is faster will probably lose with the marketing of large companies, where nice key words are used extensively: high performance, improved performance, processing power, Big Data.
R belongs to a narrow group of languages that work on arrays and matrices. The others are, for example, matlab, the APL family: K (kdb+/kona), A+ and the new player, Julia. Each of these languages has its niche: K - Wall Street, A+ - Morgan Stanley(?), matlab and Julia - technical computing and R - statistics. With all these languages R seems to have the most enjoyable syntax.
However, the use of R seems to go towards at least two areas: traditional statistics and as a tool for data mining in combination with SQL (data science and Big Data). This second area seems increasingly dominate the usage of R. So we have a packages, such as tidyverse with want to make life easier (or make it harder). They want new products, new functionalities and new users experience... more and more. This is the area where pqR will lose too.
In my opinion, the niche that pqR can fill is quite different: a well-made and correct parser that will be suitable for more traditional use. I am also for the return to the roots of R, meaning greater focus on statistics and mathematics.
This is something that I've been thinking about. Personally, I would move packages such as: Recommended
, tcltk
, stats4
, datasets
, grid
outside the pqR - can be installed by install.packages()
. And stayed only with: base
, utils
, tools
, methods
, stats
(yes), graphics
and grDevices
which are key for R.
OpenBSD has an amazing coding style. No documentation is needed if the code to be read is very easy to understand. At present in R, even guessing the opening and closing of brackets is often a big challenge. This is something I would like to deal with. It's not even about changing the code, it's just making it more readable.
I would also like to verify every file in src/main/
in reference to R Core and make sure that all key/bug patches (fixes) have also been used in pqR (+testing). It is a big effort, but I think it would save a lot of your time.
As for optimization, at the beginning I can just focus on the simple things. Here is an example of how a trivial change can improve performance:
# R
> require(microbenchmark)
Loading required package: microbenchmark
> x <- 1L
> class(x) <- 'test'
> inherits
function (x, what, which = FALSE)
.Internal(inherits(x, what, which))
<environment: namespace:base>
> microbenchmark(inherits(x, 'test'), .Internal(inherits(x, 'test', FALSE)),
+ times = 10000)
Loading required namespace: multcomp
Unit: nanoseconds
expr min lq mean median uq max neval
inherits(x, "test") 246 282 357.0745 328 388 14193 10000
.Internal(inherits(x, "test", FALSE)) 87 105 139.5389 115 152 3229 10000
Speed can also be increased by rewriting a part of the function to C:
> require(xtime)
Loading required package: xtime
> r <- as.Date(c('2019-04-01', '2019-04-02', '2019-04-03'))
> x <- as.date(c('2019-04-01', '2019-04-02', '2019-04-03'))
>
> as.numeric(r)
[1] 17987 17988 17989
> as.numeric(x)
[1] 17987 17988 17989
> storage.mode(r) # slow double
[1] "double"
> storage.mode(x) # faster integer
[1] "integer"
> x <- c('2019-04-01', '2019-04-02', '2019-04-03')
> microbenchmark(as.Date(x), as.date(x), times = 10000)
Loading required namespace: multcomp
Unit: nanoseconds
expr min lq mean median uq max neval
as.Date(x) 115494 117617.5 120587.634 119777.5 120702.5 514179 10000
as.date(x) 777 895.0 1393.754 1240.0 1778.0 34430 10000
> as.date
function (x, ...)
.Call("as_date", x, PACKAGE = "xtime")
<environment: namespace:xtime>
Every function in the base
, which will gain in speed, will also accelerate any other packages that is dependent on this function. This type of optimization will also accelerate pqR, so not only speed of the parser.
I could take care of it a little while working on verifying the fixes.
I believe that your time (developing of pqR) is more valuable than mine. It is a pity that you could lose it on less important things (simple patches, code cleaning etc.).
...greater focus on statistics and mathematics In my opinion, it makes no sense for pqR to focus on competing with R Core. It is better to return to the roots of R, i.e. statistics and mathematics. This will be appreciated by more conservative users.
For example: pqR has unicode and tab completion support. Julia has an amazing complement to mathematical symbols:
# y = \alpha<tab> * x + \omega<tab>
y = α * x + ω
I think that support for mathematical symbols in pqR would appeal to many people who work on statistical models. This code is portable and works in pqR (and R Core):
# R
> α = 3
> ω = 2
> x = 5
> y = α * x + ω
> y
[1] 17
You can also think about support for operators, but this code will no longer be portable and will not work by default in R Core.
# \eqgtr<tab> \eqqless<tab> \leftarrow<tab>
⋝ ⋜ ←
# or: \pi<tab>
π
π = 3.1415926535897...
# \sum<tab> \sqrt<tab> \mu<tab> etc.
∑ √ μ
R is popular at universities and such support for mathematical symbols should help in the compatibility of what is written in LaTeX, and what in R.
So I would not try to compete with R Core, but rather make pqR the best software for statisticians and mathematicians. pqR has a very good position to go in this direction.
Other things that I could help in developing pqR are, for example, support for maintaining the compatibility of packages (R-package-mods). It also absorbs your time that you could devote to more valuable pqR development.
I'm guessing that pqR doesn't have many users. I think, however, that it can find a niche among more conservative people, with a special focus on statistics and mathematics.
Hi. I think I'll response to this post in parts.
Regarding the primary focus of pqR, I think I agree with you. I consider the traditional focus of R to be statistical research and statistical analysis for scientific applications. I think you may be right that R Core's focus has departed from that a bit - more towards "big data" commercial applications.
Some needs for usage of R for statistical research are
expressiveness, so research can progress without wasting time on fighting to get the method implemented.
implementation in R, not in C or C++, since that's generally easier, and maximizes the number of researchers who are able to understand and modify the program.
speed, since without speed, researchers may be forced to write in C or C++ because of otherwise intolerable slowness.
An example of this sort of usage is my GRIMS package for MCMC, a rather preliminary version of which is at http://www.cs.utoronto.ca/~radford/GRIMS.html
I'm hoping to get back to work on GRIMS after finishing automatic differentiation in R and some other extensions, which will make MCMC in R much more convenient (both for research and use).
The needs for scientific data analysis are similar to those for statistical research, except that speed is sometimes not much of an issue (though it often will be, for instance if the analysis uses MCMC). There is an additional need for reproducibility, which is something that I'm hoping to address more in pqR.
Packages like knitr have helped in this application. More integrated support for mathematical symbols is an interesting idea. It may not be too hard, since R already can support Unicode.
Continuing my comments on your post...
You point out an interesting inefficiency in many base functions - they're implemented with the .Internal mechanism rather than as primitives.
As far as I can see, the motive for this is that it allows matching of arguments, and provisioning of defaults, to be done at the R level, simplifying the C code (though not by much, overall). Given that a function is implemented this way, however, there may be packages that rely on this - for example, redefining the function by inserting trace statements at the beginning or end, which wouldn't be possible if it were a primitive (at least not in the same way). For this reason, as well as economy of effort, it would be better to just speed up functions whose body is a .Internal, rather than rewrite them all.
I've done some tests that indicate that R Core has been working on this, at least when the bytecode compiler is used (as is standard for R Core). Here's a test script:
x <- 1L
class(x) <- "test"
f <- function () {
print(system.time (
for (i in 1:10000000) r <- inherits(x,"test")))
r
}
g <- function () {
print(system.time (
for (i in 1:10000000) r <- .Internal(inherits(x,"test",FALSE))))
r
}
f()
g()
inherits <- function (x,w) FALSE
f()
And here's output with R-3.5.2:
> f()
user system elapsed
1.534 0.000 1.534
[1] TRUE
> g()
user system elapsed
1.443 0.000 1.443
[1] TRUE
> inherits <- function (x,w) FALSE
> f()
user system elapsed
2.148 0.008 2.156
[1] FALSE
Note that replacing "inherits" with the .Internal that implements it only slightly speeds things up. Also note that redefining "inherits" as a trivial function actually gives slower results - indicating that "inherits" is being specially handled in some way.
In contrast, here is pqR-2019-02-19:
> f()
user system elapsed
2.545 0.000 2.546
[1] TRUE
> g()
user system elapsed
0.876 0.000 0.876
[1] TRUE
> inherits <- function (x,w) FALSE
> f()
user system elapsed
1.125 0.000 1.124
[1] FALSE
In pqR, replacing "inherits" with its .Internal implementation does speed things up (as you had noticed). And as expected, a trivial "inherits" is faster than the actual one. One can see that pqR is faster for all tests except the one using the base "inherits", which seems to be specially handled by R Core's bytecode compiler.
There are actually some special things going on in pqR regarding .Internal, and maybe this could be improved further to make them faster. Rewriting all of them as primitives seems like it would take time that would be better spent on other modifications.
Some more on your post...
Unfortunately, a project to clean up the code - even just cosmetically, making formatting consistent, using consistent and meaningful variable names, adding helpful comments, etc. - has a downside. It destroys the ability to see what has changed with "diff". This is a huge problem if one is trying to merge changes from another fork.
Because of this, I usually only clean up code when I'm making changes in the vicinity anyway.
Merging R Core changes in the parts of the interpreter that have been heavily modified in pqR isn't trivial. The approach I often use to add some R Core feature is to do a "diff" with the previous R Core version, not with pqR, to see what parts need changing. (That has its problems, though, since R Core does not consistently make changes with a single commit, that also adds a NEWS entry indicating what the change was - often understandably, since it sometimes takes a few tries to get things right.)
For more peripheral parts (e.g., the grid package), tracking changes is easier, though there is still a possibility that some change that appears to only involve peripheral code actually relies on some coincident change in the core interpreter.
If you'd like to discuss further what it might be useful for you to work on, that's probably better done via email (to radfordneal@gmail.com) than in this issues thread. There are various possibilities, but I'd have to know more about your areas of experience to know which to suggest...
@radfordneal: If you'd like to discuss further what it might be useful for you to work on, that's probably better done via email (to radfordneal@gmail.com) than in this issues thread.
It's good to keep the discussion open to others, so a good solution would be a mailing list or using 'Projects' on GitHub. So maybe it is worth to run 'Project' on GitHub for such discussion related to general development (eg pqR Ideas)? If not: daniel.cegielka@gmail.com.
Your suggestion for https I can start with this. I have already reviewed the changes and the main functionality that is lacking in pqR is here: https://github.com/wch/r-source/commit/6b3353ea2e48c19b2a1120db05b2bebe31260d63#diff-03c2afd132d3ea79e5e6e204ec246201
Parser and ".Internal" My observations led me to the conclusion that often slow code results from the fact that many times the same functions are called (1: duplication) and not directly (2: longer path). Differences seem small, because this is measured in nanoseconds (so nobody cares), but if they are repeated many times (loops, vectors), then the whole code is noticeably slower. I don't have a simple solution here. (pq)R developers should be low-level (direct calls), but user shouldn't do this, and what is important is what you wrote: it would be better to just speed up the functions of whose body.
Compiler (bytecode) vs parser In theory, the bytecode (compiler) should be faster than the parser. In theory. The best and the fastest parser I've ever seen is K (in kdb+), whose speed is comparable to C. One of the reasons why K is so fast is that the language itself is minimal, and the programs written in it are extremely short. This means that the parser and K code are often entirely in the CPU cache (even L1/L2). They usually don't use unnecessary spaces, so the parser doesn't have to waste time on it. In their case, the bytecode compiler doesn't make any sense.
The strength of pqR is the parser, which is comparable (or even much faster) to what the R Core bytecode compiler offers. Unfortunately, the parser is directly sensitive to the programmer, his experience. You showed it in your test (compiler & inherits).
> require(compiler)
Loading required package: compiler
> require(microbenchmark)
Loading required package: microbenchmark
>
> f1 <- function()NULL
> f2 <- function(){NULL}
> c1 <- cmpfun(f1)
> c2 <- cmpfun(f2)
>
> microbenchmark(f1(),f2(),c1(),c2(), times=1000000)
Loading required namespace: multcomp
Unit: nanoseconds
expr min lq mean median uq max neval
f1() 44 59 72.03692 60 62 364360 1e+06
f2() 64 76 90.51414 81 84 215226 1e+06
c1() 57 69 83.41272 74 79 37587 1e+06
c2() 72 82 98.65406 86 93 220972 1e+06
Note: Ignore the mean and max, since they are susceptible to GC and the OS/kernel scheduler.
Removing unnecessary brackets in this case causes that latency drops by 25-30% (ceteris paribus or ignoring other factors). The conclusion is that an experienced programmer with the pqR parser can write a code that will be much faster than usual. This is because the parser is directly sensitive to what the programmer is doing, and the programmer can write code that will be parser optimized, so no additional value in bytecode compiler.
Code refactoring @radfordneal : Unfortunately, a project to clean up the code - even just cosmetically, making formatting consistent, using consistent and meaningful variable names, adding helpful comments, etc. - has a downside. It destroys the ability to see what has changed with "diff". This is a huge problem if one is trying to merge changes from another fork. Because of this, I usually only clean up code when I'm making changes in the vicinity anyway.
I'm aware of this problem, because I've hit it before. No option is cheap: R Core is starting to improve the coding style, so this means bigger and bigger diff with the pqR code. My idea is two pqR repositories:
pqR_clean - this version would be completely cleaned up. pqR_rcore - this is the current pqR where the old code is maintained, which corresponds to what is in R Core.
pqR would be developed (new code) in pqR_clean, and then changes (only new code, no cleaning) would be applied to pqR_rcore. pqR_rcore would show the difference to R Core (as you are developing it now). If something was to be transferred from R Core to pqR, then it would be first applied to pqR_rcore and then to pqR_clean (after cleaning and commenting). diff pqR_rcore pqR_clean
would show the code cleaned vs old code (old, which means R Core), and diff R_Core pqR_rcore
would show changes introduced in pqR vs R Core.
This approach on the one hand raises maintenance costs, because there are two main repositories, but on the other, it reduces development costs, because the code itself will be clearer and better documented. I suppose that from the version to version current pqR's diff to R Core is getting bigger and bigger, so in the long run my proposal should gain value. This model worked well for Linux, so why not for pqR. We can even start with one C file and see if this model works.
Regarding https, I think the commit you reference can't be all of it. A starting point may be this comment from NEWS for R-3.0.0:
It is now possible to write custom connection implementations outside core R using R_ext/Connections.h. Please note that the implementation of connections is still considered internal and may change in the future (see the above file for details).
I think this facility is used by some packages that one would want to be able to use for internet access. There are presumably some changes to src/main/connections.c as well.
Regarding some of your other comments, they make more sense to me if I assume that by "parser" you actually mean "interpreter". The parser for pqR can be faster than R Core's, but that is usually not a big issue. What you mean, I think, is that the interpreter for the parsed code (R language objects) is faster in pqR, which is certainly true, and indeed it's often faster than R Core's bytecompiled code.
I don't recommend using microbenchmark for timing comparisons - see my blog post at https://radfordneal.wordpress.com/2014/02/02/inaccurate-results-from-microbenchmark/
I've tried both microbenchmark and system.time for your example, for pqR-2019-02-19 with and without bytecode enabled (pqR now ignores bytecode unless R_USE_BYTECODE=TRUE is set as an environment variable) and for R-3.5.2 with and without JIT enabled (disabled by the R_ENABLE_JIT=0 environment setting). I used system.time with ten times as many repetitions, which actually takes about the same time as microbenchmark, due to its overhead. I called the functions ten times in the for loop in system.time, to minimize the loop overhead.
Here are the results:
R-3.5.2 with JIT enabled:
> microbenchmark(f1(),f2(),c1(),c2(), times=10000000)
Unit: nanoseconds
expr min lq mean median uq max neval
f1() 84 94 136.7001 103 107 79314038 1e+07
f2() 84 95 111.6219 102 106 2103797 1e+07
c1() 88 97 116.2637 106 110 1747074 1e+07
c2() 88 98 115.4068 105 109 1804220 1e+07
> system.time(for (i in 1:10000000) {
+ f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1()
+ })
user system elapsed
8.066 0.000 8.067
> system.time(for (i in 1:10000000) {
+ f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2()
+ })
user system elapsed
8.115 0.000 8.116
> system.time(for (i in 1:10000000) {
+ c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1()
+ })
user system elapsed
8.309 0.000 8.309
> system.time(for (i in 1:10000000) {
+ c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2()
+ })
user system elapsed
8.331 0.005 8.335
> system.time(gc())
user system elapsed
0.016 0.000 0.015
R-3.5.2 with no JIT (but cmpfun still compiling, of course):
> microbenchmark(f1(),f2(),c1(),c2(), times=10000000)
Unit: nanoseconds
expr min lq mean median uq max neval
f1() 74 93 115.1359 94 98 79024564 1e+07
f2() 101 118 138.7027 120 124 78989802 1e+07
c1() 92 102 120.0987 111 116 1922703 1e+07
c2() 93 104 130.4335 112 117 79248321 1e+07
> system.time(for (i in 1:10000000) {
+ f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1()
+ })
user system elapsed
7.560 0.000 7.559
> system.time(for (i in 1:10000000) {
+ f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2()
+ })
user system elapsed
9.961 0.000 9.962
> system.time(for (i in 1:10000000) {
+ c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1()
+ })
user system elapsed
9.272 0.000 9.272
> system.time(for (i in 1:10000000) {
+ c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2()
+ })
user system elapsed
9.213 0.000 9.213
> system.time(gc())
user system elapsed
0.015 0.000 0.014
pqR-2019-02-19 with use of bytecode disabled (compiler still generates it, but it's then ignored):
> microbenchmark(f1(),f2(),c1(),c2(), times=10000000)
Loading required namespace: multcomp
Unit: nanoseconds
expr min lq mean median uq max neval
f1() 34 41 44.19212 42 43 148532 1e+07
f2() 44 52 55.58601 53 55 165791 1e+07
c1() 39 43 49.07375 47 50 218246 1e+07
c2() 50 57 94.73496 60 62 325279220 1e+07
> system.time(for (i in 1:10000000) {
+ f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1()
+ })
user system elapsed
3.88 0.00 3.88
> system.time(for (i in 1:10000000) {
+ f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2()
+ })
user system elapsed
5.377 0.000 5.377
> system.time(for (i in 1:10000000) {
+ c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1()
+ })
user system elapsed
4.551 0.000 4.552
> system.time(for (i in 1:10000000) {
+ c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2()
+ })
user system elapsed
5.862 0.000 5.862
> system.time(gc())
user system elapsed
0.009 0.000 0.008
pqR-2019-02-19 with bytecode actually used:
> microbenchmark(f1(),f2(),c1(),c2(), times=10000000)
Loading required namespace: multcomp
Unit: nanoseconds
expr min lq mean median uq max neval
f1() 32 40 43.46826 41 42 165362 1e+07
f2() 43 52 88.46080 54 55 324854537 1e+07
c1() 41 44 48.81613 45 50 149475 1e+07
c2() 40 44 48.43223 45 49 147211 1e+07
> system.time(for (i in 1:10000000) {
+ f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1()
+ })
user system elapsed
3.792 0.000 3.792
> system.time(for (i in 1:10000000) {
+ f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2(); f2()
+ })
user system elapsed
5.306 0.000 5.306
> system.time(for (i in 1:10000000) {
+ c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1(); c1()
+ })
user system elapsed
4.591 0.000 4.591
> system.time(for (i in 1:10000000) {
+ c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2(); c2()
+ })
user system elapsed
4.588 0.000 4.588
> system.time(gc())
user system elapsed
0.009 0.000 0.008
One thing to note is that none of the columns of output for microbenchmark correspond well to the output of system.time (divided by ten). If one takes the system.time output as being correct (as I do), this indicates that micro benchmark has problems. One can also see this by comparing the max times from microbenchmark with the time for a full garbage collection (see the last system.time call), which ought to be the main reason for occasional large times. The max time is actually much larger than the time for a full garbage collection, which is rather strange.
As far as the substance of the result goes, one can indeed see that there's a substantial difference between NULL and {NULL} when the code is interpreted - though note that the difference would be much less in relative terms if the function did something useful, rather than just return NULL. This is partially just the interpretive overhead of calling the {} operator, but it also has to do with {} maintaining some stuff that helps with debugging, which maybe could be handled in some more efficient way.
In complied code, either in R-3.5.2 or pqR-2019-02-19, NULL and {NULL} have the same performance - presumably the compiler just ignores the redundant {}. It's interesting that for both R-3.5.2 and pqR-2019-02-19, the bytecode versions are faster for {NULL} but not NULL. Of course, this is of limited practical relevance, since the function is unrealistically simple.
When pqR is run with bytecode being ignored, the compiled versions are slower because the interpreter gets to the uncompiled code indirectly when it has to ignore bytecode along the way.
It's a bit puzzling that when the JIT is enabled R-3.5.2 is faster with the function compiled by JIT rather than explicitly with cmpfun. But the difference isn't large. R-3.5.2 is quite a bit slower than interpreted pqR (and also than bytecompiled pqR). The bytecode implementation has been "improved" in recent R Core versions, but it is possible that these changes actually make things slower in some cases, perhaps because they lead to more objects being created and accessed, with bad effects on garbage collection time and cache performance. One can note that the amount of memory occupied by R objects when a clean R session is started is quite a bit larger for R Core than for pqR, largely, I think, due to all the bytecode for base functions (though objects are also smaller in pqR, due to more compact memory layout made possible by pqR's new garbage collector).
First, I want to show my deep admiration to your great work. It is incredible that you are maintaining this project by one person. I was wondering if it is possible that your pqR code be merged to R core, since that will improve the performance.