r-lib / fastmap

Fast map implementation for R
https://r-lib.github.io/fastmap/
Other
133 stars 7 forks source link

Copy fastmap #24

Closed stla closed 1 year ago

stla commented 1 year ago

Hello,

To make a copy of a fastmap I use this function:

copy <- function(map) {
  fm <- fastmap()
  for(k in map$keys()) {
    fm$set(k, map$get(k))
  }
  fm
}

I'm afraid this is inefficient for a large fastmap. It would be nice to have a copy function which does the copy in C++, if possible.

wch commented 1 year ago

I think that it's possible to improve performance of copying without writing and C++ code.

Basically, the strategy is to serialize() and then unserialize() the fastmap object. The first time that the copy is used (for example, by calling $get()), it will call C++ code to repopulate some internal data structures, and this takes a bit of time. Still, the overall time is faster than the manual copying in your method.

This example copies a fastmap object with 1 million items, and is 5x faster using the method I described.

library(fastmap)

rm(list=ls())
n <- 1e6
x <- seq_len(n)

m <- fastmap()
m$mset(.list = setNames(x, as.character(x)))

# Using mset(as_list())
invisible(gc())
system.time({
  m1 <- fastmap()
  m1$mset(.list = m$as_list())
})
#>    user  system elapsed 
#>   2.429   0.025   2.454 

# stla's copy function
invisible(gc())
copy <- function(map) {
  fm <- fastmap()
  for(k in map$keys()) {
    fm$set(k, map$get(k))
  }
  fm
}
system.time({
  m2 <- copy(m)
})
#>    user  system elapsed 
#>   2.955   0.029   2.984 

# Copy internal values using unserialize(serialize())
invisible(gc())
system.time({
  m3 <- unserialize(serialize(m, NULL))
  # Calling get() calls ensure_restore_map(), which causes the internal
  # key_idx_map to be repopulated. This takes a bit of time.
  m3$get('123')
})
#>    user  system elapsed 
#>   0.624   0.024   0.648 

It's possible performance could still be better by copying the internal data structure in C++, but still, this is a significant improvement.

wch commented 1 year ago

I just took a quick look into this, and it should be possible to copy the C++ hopscotch_map object with the = operator.

Do you have a specific use case for this? If so, how large is the map you're using?

stla commented 1 year ago

Hello @wch

My program has a different behavior when I switch from my copy function given in the OP and the $clone() method. Would you know what can cause this difference?

stla commented 1 year ago

Hmm IMHO this is just due to a different order of the elements. Could you please confirm that there should be no difference?

wch commented 1 year ago

Can you provide a reproducible example?

stla commented 1 year ago

I have but it is long and complex. Anyway I checked, I get the same results up to the order of the map.