Having a big table "big", and tiny dictionary "dict", when we merge both datasets we create additional copy of data, even when we assign to same variable in R. If we just want to lookup new column to big table from dictionary, we need to remove original h2o frame to avoid copy. It would be handy to add column in-place during join. Below code shows the current workflow, to join, and later remove original h2o frame. {code} memory_usage <- function() { res <- h2o:::.h2o.fromJSON(jsonlite::fromJSON(h2o:::.h2o.doSafeGET(urlSuffix = h2o:::.h2o.__CLOUD), simplifyDataFrame = FALSE)) sum(sapply(res$nodes, [[, "mem_value_size") / (1024^2)) # MB } library(h2o) h2o.init() memory_usage()

[1] 0

big = as.h2o(iris) dict = data.frame(Species=c("virginica","versicolor","setosa"), new_species=c(rep("versinica",2), "setosinica")) dict = as.h2o(dict) h2o.ls()

key

1 dict

2 iris

memory_usage()

[1] 0.0078125

big = h2o.merge(big, dict, by="Species") h2o.ls()

key

1 RTMP_sid_9aec_8

2 dict

3 iris

memory_usage()

[1] 0.015625

h2o.getId(big)

[1] "RTMP_sid_9aec_8"

h2o.rm("iris") h2o.ls()

key

1 RTMP_sid_9aec_8

2 dict

memory_usage()

[1] 0.01074219

{code}

h2oai / h2o-3

Lookup column: add column in-place during join #10375

[1] 0

key

1 dict

2 iris

[1] 0.0078125

key

1 RTMP_sid_9aec_8

2 dict

3 iris

[1] 0.015625

[1] "RTMP_sid_9aec_8"

key

1 RTMP_sid_9aec_8

2 dict

[1] 0.01074219