Open jrosell opened 2 months ago
Hey @jrosell thanks for your ideas. I'm not yet sure, do i get you correctly that you imagine a wrapper fun around rix::rix() that generates the above shell script? I think its sufficient and easier to just write a really short R script that defines the environment:
# env.R
rix::rix(
r_ver = "4.3.2",
r_pkgs = "data.table",
overwrite = TRUE,
project_path = "./my_proj_subdir"
)
Run that env.R
in your R session or via Rscript.
Then use a custom bash , nix-rscript.sh
with nix-shebang syntax that could be part of inst/extdata and a helper to copy it to the current proj dir, to be implemented. chmod +x
#!/usr/bin/env nix-shell
#! nix-shell -i bash --pure default.nix
Rscript \
--no-site-file \
--no-environ \
--no-restore \
${1}
And just
./nix-rscript.sh data-visualize.R
To sum up, maybe like a littler script helper. https://nix.dev/tutorials/first-steps/reproducible-scripts.html is a nice ref.
Well, the goal is to have rix anotations at script level so one can run something like: rix run script.R
In Python one can do inline anotations and run: uv run script.py
Well, the goal is to have rix anotations at script level so one can run something like: rix run script.R
In python one can do with inline anotations and run: uv run script.py
Well, these are at least two pair of shoes. I like those inline annotations. It needs a lot on top of the nix-Rscript runner. Tooling in nix? Nix ensures reproducibility via output hashes and based on inputs in the expression supplied. Currently, rix boilerplating assumes one fixed nixpkgs revisions/git hashes for specific packages, but in principle it could be extended to multiple. There is quite a bit of tooling needed so we can leverage renv lockfile to-nix work (see https://github.com/b-rodrigues/rix/issues/5 ) . Ideas and PRs are very welcome. Wanna join the Nixpkgs R matrix channel? Could be a good place to brainstorm, too.
Here is what I have. It works in Ubuntu: https://github.com/jrosell/rix-run
that's really cool, I must admit that I didn't really understand what you meant but now that I see it, it's really nice!
How would you like to move forward with this? Would you like to have it included into rix? We are in the process of submitting to CRAN very soon so now wouldn't be the right moment to add a completely new feature, however if you want to continue to work on it feel free, and we could merge a PR for a next release.
I think that the rix-run script belongs to rix, but I belive that the script should work fine on more systems. So, we can wait.
To keep you update, it turns out that rix-run plays well with targets script file too. I really like the ability to have multiple target scripts in the same project.
https://github.com/jrosell/rix-run?tab=readme-ov-file#targets-single-file
I thought about this idea and I think it could be taken further using {processx} as {callr} do.
I imagine something like this for testing same function on diferent R versions using nix shell processes.
bench::mark( rix::run(rix::rix(v_ver="4.3.1"), my_function), rix::run(rix::rix(v_ver="4.4.1"), my_function) )
What do you think?
I thought about this idea and I think it could be taken further using {processx} as {callr} do.
I imagine something like this for testing same function on diferent R versions using nix shell processes.
bench::mark( rix::run(rix::rix(v_ver="4.3.1"), my_function), rix::run(rix::rix(v_ver="4.4.1"), my_function) )
What do you think?
Running R functions in different Nix R environments is exactly what with_nix()
that I implememented does. see e.g. https://github.com/ropensci/rix/blob/287e8bd5d41649247747a499e459ef33cc7c76e0/R/with_nix.R#L284-L292 We also have docs for it.
https://docs.ropensci.org/rix/articles/z-advanced-topic-running-r-or-shell-code-in-nix-from-r.html
We do it via {sys} and have some safe defaults to run code it different nix shells, with proper recursive detection of globals etc. The approach really works well and I don't think it's necessary to have duplicate functionality.
For functionality under the hood, see https://github.com/ropensci/rix/blob/main/R/with_nix_helpers.R
Cheers, Philipp
Thanks, Philipp. I tested it a bit and I get some weird results with this approach. I assume it's because it doesn't make sense to benchmark with less than 10s precision with this implementation.
benchmark_dummy <- \(){
invisible(NULL)
}
benchmark_memCompress <- \(){
txt <- readLines(file.path(R.home(), "COPYING"))
for(i in 1:100) {
memCompress(txt, "g")
}
invisible(NULL)
}
results_r <- bench::mark(
dummy = {
benchmark_dummy()
},
memCompress ={
benchmark_memCompress()
},
check = FALSE,
memory = FALSE,
min_time = 10
)
results_r[,c("expression", "median")]
#> # A tibble: 2 × 2
#> expression median
#> <bch:expr> <bch:tm>
#> 1 dummy 250.1ns
#> 2 memCompress 43.4ms
# Configuring and initial set up of the two environments
rix::rix(r_ver = "3.6.3", project_path = "/tmp/R/3.6.3", overwrite = TRUE)
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
rix::rix(r_ver = "latest", project_path = "/tmp/R/latest", overwrite = TRUE)
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R")
# Get the fastest time
results_dummy <- bench::mark(
old_dummy = {
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
},
new_dummy ={
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R")
},
check = FALSE,
memory = FALSE,
min_time = 30
)
results_dummy[,c("expression", "median")]
#> # A tibble: 2 × 2
#> expression median
#> <bch:expr> <bch:tm>
#> 1 old_dummy 5.05s
#> 2 new_dummy 7.05s
# Get the bechmark times
results_memCompress <- bench::mark(
old_memCompress = {
rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/3.6.3")
},
new_memCompress ={
rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/latest")
},
check = FALSE,
memory = FALSE,
min_time = 30
)
results_memCompress[,c("expression", "median")]
#> # A tibble: 2 × 2
#> expression median
#> <bch:expr> <bch:tm>
#> 1 old_memCompress 5.69s
#> 2 new_memCompress 8.36s
Thanks, Philipp. I tested it a bit and I get some weird results with this approach. I assume it's because it doesn't make sense to benchmark with less than 10s precision with this implementation.
benchmark_dummy <- \(){ invisible(NULL) } benchmark_memCompress <- \(){ txt <- readLines(file.path(R.home(), "COPYING")) for(i in 1:100) { memCompress(txt, "g") } invisible(NULL) } results_r <- bench::mark( dummy = { benchmark_dummy() }, memCompress ={ benchmark_memCompress() }, check = FALSE, memory = FALSE, min_time = 10 ) results_r[,c("expression", "median")] #> # A tibble: 2 × 2 #> expression median #> <bch:expr> <bch:tm> #> 1 dummy 250.1ns #> 2 memCompress 43.4ms # Configuring and initial set up of the two environments rix::rix(r_ver = "3.6.3", project_path = "/tmp/R/3.6.3", overwrite = TRUE) rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R") rix::rix(r_ver = "latest", project_path = "/tmp/R/latest", overwrite = TRUE) rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R") # Get the fastest time results_dummy <- bench::mark( old_dummy = { rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R") }, new_dummy ={ rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R") }, check = FALSE, memory = FALSE, min_time = 30 ) results_dummy[,c("expression", "median")] #> # A tibble: 2 × 2 #> expression median #> <bch:expr> <bch:tm> #> 1 old_dummy 5.05s #> 2 new_dummy 7.05s # Get the bechmark times results_memCompress <- bench::mark( old_memCompress = { rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/3.6.3") }, new_memCompress ={ rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/latest") }, check = FALSE, memory = FALSE, min_time = 30 ) results_memCompress[,c("expression", "median")] #> # A tibble: 2 × 2 #> expression median #> <bch:expr> <bch:tm> #> 1 old_memCompress 5.69s #> 2 new_memCompress 8.36s
Yes, exactly, it doesn't make sense to benchmark, because there is a serialization/deserialization overhead (including detecting and assigning globals recursively before), the time to invoke nix-shell
(which is known its relatively slow as packaged in NixCpp).
I have currently on my aarch64 MacbookM2 about 2.5s median time (my rocky linux in my home network is currently disconnected from ssh access). Had to switch to to microbenchmark::microbenchmark()
because bench::mark()
errored with a file unlinking problem, and also i just test dummy in "latest" R because back then that arch did not exist on nixpkgs. But the 2.5 seconds I got would match also a similar benchmarking overhead between haskell build tool and nix-shell invocation: https://github.com/commercialhaskell/stack/issues/4406
benchmark_dummy <- \(){
invisible(NULL)
}
benchmark_memCompress <- \(){
txt <- readLines(file.path(R.home(), "COPYING"))
for (i in 1:100) {
memCompress(txt, "g")
}
invisible(NULL)
}
results_r <- bench::mark(
dummy = {
benchmark_dummy()
},
memCompress ={
benchmark_memCompress()
},
check = FALSE,
memory = FALSE,
min_time = 10
)
r_latest_path <- file.path("latest")
r_3_6_3_path <- file.path("3.6.3")
results_r[, c("expression", "median")]
# Configuring and initial set up of the two environments
# R 3.6.3 is not available for aarch64-darwin,will not build because at that
# time nixpkgs was not yet supporting the Apple Silicon architecture
# rix::rix(r_ver = "3.6.3", project_path = r_3_6_3_path, overwrite = TRUE)
# rix::nix_build(project_path = r_3_6_3_path)
rix::rix(r_ver = "latest", project_path = r_latest_path, overwrite = TRUE)
rix::nix_build(project_path = r_latest_path)
# Get the fastest time
results_dummy <- bench::mark(
# old_dummy = {
# rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
# },
new_dummy ={
rix::with_nix(benchmark_dummy, project_path = r_latest_path, program = "R")
},
check = FALSE,
memory = FALSE,
filter_gc = FALSE,
min_time = 10
)
benchmark_new <- microbenchmark::microbenchmark(
new_dummy ={
rix::with_nix(benchmark_dummy, project_path = r_latest_path, program = "R")
},
times = 20
)
Where i get
> benchmark_new
Unit: seconds
expr min lq mean median uq max neval
new_dummy 2.379217 2.4867 2.54477 2.503232 2.584663 2.785343 20
whatever it will be, you will have the overhead of nix-shell
, which is significant, when you launch all from the same session. Otherwise you can just open two nix-R sessions in different subfolders and just run the same R scripts for benchmarking in separate R environments.
I'm not sure if I understand well what you said in the last paragraph. Do you mean to run two separate benchmarks in two diferent scripts? I think I can try it with my rix-run tool. It could make sense.
I tried the {rix} package today and I think about two features that could make it more awesome for R development.
Let me give an example here.
data-visualize.R file
rix file
The first feature is a rix command line tool. For example, one can run: bash rix $(pwd)/data-visualize.R to generate the ''penguin-plot.png' plot.
The second feature is inline script metadata for R like python already have.
If you look at my code for the rix file I already set the rix R command in the nix-shell call but I think it could be anotated some way in the file to be run.
Let me know what you think.