Closed stla closed 1 year ago
The fastmap does use an external pointer to a map data structure in C++. However, it also can be saved as an rds file. When a fastmap object is loaded from an rds, the first time you call a method on it, it will rebuild the C++ data structure from the R data structures which can be restored from the rds file. See https://github.com/r-lib/fastmap#serialization for more info.
I don't understand how your fastmap represents a number like sqrt(2)
. But generally speaking, if you have a small number of keys, the method you describe should be fine. Just keep in mind that if you are using a named list as the backing store, R will do a linear search for keys. This is fine for relatively small number of keys, but will slow down as the list gets larger. You could also use an environment, which like fastmap has a constant lookup time.
However, it also can be saved as an rds file.
Awesome. I love R!
I don't understand how your fastmap represents a number like sqrt(2).
My fastmap represents a cyclotomic number. The cyclotomic numbers are complex numbers. They form a field and they contain all square roots of integers (and more). Mathematically this is a polynomial, encoded as a fastmap. The interest is that you can do exact calculations, e.g. sqrt(2) * sqrt(3) == sqrt(6)
is TRUE
, there's no conversion to float.
Thanks!
Hi @wch
I finished my R package (cyclotomic, soon on CRAN). Finally I don't use fastmap. I needed an ordered container with integer keys for cyclotomic, then I did another package, intmap (on CRAN now), which wraps the Boost flat_map
. The intmap
is slower than a list in general, but it is implemented as a R6 class and it can be more convenient than a list in some situations. On the Rcpp side, the intmap
is implemented as a Rcpp module, I'm wondering whether this is why it is slower than a list.
In the description of fastmap, you say that a flatmap
is better than a list from a memory perspective. Could you please tell me how do you assess that? Is there a R package to measure the "memory performance"? (I'm totally ignorant in this domain, I don't know what is a "memory leak").
In the description of fastmap, you say that a
flatmap
is better than a list from a memory perspective.
Do mean fastmap
instead of flatmap
?
In the README, I say that fastmap
does not leak memory, as opposed to an environment, not a list.
Environments are the R data structure that is best suited for use as a key-value store. Named lists are generally not as good as environments, because with lists, (A) it does a linear search to find the key, and (B) adding or removing elements can result R making a copy of the list, which can be slow.
However, for small sets of items, named lists might be as fast or faster than environments because there is less overhead in these operations. If you want to understand this stuff, you should learn about big O notation and linear search vs. hash tables.
This example demonstrates the memory leak: https://github.com/r-lib/fastmap#memory-leak-examples
If you need to learn what a memory leak is, there are resources on the internet that do a better job explaining it than I can.
Ok, thanks. Yes I meant fastmap
. The flat_map
is the map of my package, from Boost. I will take a look to hash tables and to your link.
On my Linux, with your first code (the one with a regular environment), I get something different than the results you show: the mem_used
is constant, and the memory leaked is low (112 bytes). With my intmap
, the elapsed time is indeed higher, and the memory leaked is 0 byte.
On my Mac and on Linux (with the latest version of R-devel) I still get the memory leak.
On Linux:
> sessionInfo()
R Under development (unstable) (2023-04-16 r84269)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS
Matrix products: default
BLAS: /usr/local/RD/lib/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] fastmap_1.1.1 pryr_0.1.6
loaded via a namespace (and not attached):
[1] compiler_4.4.0 magrittr_2.0.3 cli_3.6.1 tools_4.4.0
[5] parallel_4.4.0 glue_1.6.2 Rcpp_1.0.10 stringi_1.7.12
[9] codetools_0.2-19 stringr_1.5.0 lifecycle_1.0.3 rlang_1.1.0
> library(pryr)
gc()
start_mem <- mem_used()
start_time <- as.numeric(Sys.time())
for (i in 1:8) {
cat(i, ": ", sep = "")
print(mem_used())
e <- new.env(parent = emptyenv())
for (j in 1:10000) {
# Generate random key
x <- as.character(runif(1))
exists(x, envir = e, inherits = FALSE)
}
rm(e, x)
}
end_time <- as.numeric(Sys.time())
gc()
end_mem <- mem_used()
cat("Elapsed time:", round(end_time - start_time, 1), "seconds\n")
cat("Memory leaked:", end_mem - start_mem, "bytes\n")
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 459604 24.6 977744 52.3 670820 35.9
Vcells 829506 6.4 8388608 64.0 1869433 14.3
1: 36.8 MB
2: 39 MB
3: 41 MB
4: 43 MB
5: 45 MB
6: 47 MB
7: 49 MB
8: 51 MB
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 761752 40.7 1213292 64.8 1084541 58
Vcells 1288630 9.9 8388608 64.0 4449608 34
Elapsed time: 2 seconds
Memory leaked: 20589896 bytes
I would be surprised if you are not seeing a memory leak. What is your sessionInfo
, and are you sure you're running that exact same code?
Indeed.
> library(pryr)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 808656 43.2 1548232 82.7 1193947 63.8
Vcells 1927909 14.8 8388608 64.0 4172681 31.9
> start_mem <- mem_used()
> start_time <- as.numeric(Sys.time())
> for (i in 1:8) {
+ cat(i, ": ", sep = "")
+ print(mem_used())
+ e <- new.env(parent = emptyenv())
+ for (j in 1:10000) {
+ # Generate random key
+ x <- as.character(runif(1))
+ exists(x, envir = e, inherits = FALSE)
+ }
+ rm(e, x)
+ }
1: 60.8 MB
2: 63 MB
3: 65 MB
4: 67 MB
5: 69.5 MB
6: 71.5 MB
7: 73.5 MB
8: 75.5 MB
> end_time <- as.numeric(Sys.time())
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1052179 56.2 1897878 101.4 1373544 73.4
Vcells 2319993 17.8 8388608 64.0 5480731 41.9
> end_mem <- mem_used()
> cat("Elapsed time:", round(end_time - start_time, 1), "seconds\n")
Elapsed time: 1.1 seconds
> cat("Memory leaked:", end_mem - start_mem, "bytes\n")
Memory leaked: 16774168 bytes
Yesterday I changed runif(1)
to rpois(1, 100)
, that's the only difference.
> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.3
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=fr_BE.UTF-8 LC_NUMERIC=C LC_TIME=fr_BE.UTF-8
[4] LC_COLLATE=fr_BE.UTF-8 LC_MONETARY=fr_BE.UTF-8 LC_MESSAGES=fr_BE.UTF-8
[7] LC_PAPER=fr_BE.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=fr_BE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] pryr_0.1.6
loaded via a namespace (and not attached):
[1] compiler_4.2.3 magrittr_2.0.3 cli_3.6.1 tools_4.2.3 glue_1.6.2
[6] rstudioapi_0.14 Rcpp_1.0.10 stringi_1.7.12 codetools_0.2-19 stringr_1.5.0
[11] lifecycle_1.0.3 rlang_1.1.0
Hello,
Am I right or wrong to believe that a fastmap works with external pointers, and therefore cannot be saved as a
rds
file?I want to save a fastmap for a package. My fastmap represents the number
sqrt(2)
for example. Moreover it would be better if it were constant, that is to say if it had only theget
andkeys
methods. I'm wondering whether it would be better to create a simple list with two fieldsget
andkeys
instead of using a fastmap.Edit
But this fastmap must be cloneable too.