Closed stla closed 1 year ago
I think that it's possible to improve performance of copying without writing and C++ code.
Basically, the strategy is to serialize()
and then unserialize()
the fastmap object. The first time that the copy is used (for example, by calling $get()
), it will call C++ code to repopulate some internal data structures, and this takes a bit of time. Still, the overall time is faster than the manual copying in your method.
This example copies a fastmap object with 1 million items, and is 5x faster using the method I described.
library(fastmap)
rm(list=ls())
n <- 1e6
x <- seq_len(n)
m <- fastmap()
m$mset(.list = setNames(x, as.character(x)))
# Using mset(as_list())
invisible(gc())
system.time({
m1 <- fastmap()
m1$mset(.list = m$as_list())
})
#> user system elapsed
#> 2.429 0.025 2.454
# stla's copy function
invisible(gc())
copy <- function(map) {
fm <- fastmap()
for(k in map$keys()) {
fm$set(k, map$get(k))
}
fm
}
system.time({
m2 <- copy(m)
})
#> user system elapsed
#> 2.955 0.029 2.984
# Copy internal values using unserialize(serialize())
invisible(gc())
system.time({
m3 <- unserialize(serialize(m, NULL))
# Calling get() calls ensure_restore_map(), which causes the internal
# key_idx_map to be repopulated. This takes a bit of time.
m3$get('123')
})
#> user system elapsed
#> 0.624 0.024 0.648
It's possible performance could still be better by copying the internal data structure in C++, but still, this is a significant improvement.
I just took a quick look into this, and it should be possible to copy the C++ hopscotch_map
object with the =
operator.
Do you have a specific use case for this? If so, how large is the map you're using?
Hello @wch
My program has a different behavior when I switch from my copy
function given in the OP and the $clone()
method. Would you know what can cause this difference?
Hmm IMHO this is just due to a different order of the elements. Could you please confirm that there should be no difference?
Can you provide a reproducible example?
Hello,
To make a copy of a fastmap I use this function:
I'm afraid this is inefficient for a large fastmap. It would be nice to have a
copy
function which does the copy in C++, if possible.