Open mb706 opened 1 year ago
@mb706 I've found the same but on a much smaller scale, but the memory allocation was higher than it should be, this is due to the do.call(data.table, c(cols, list(check.names = TRUE)))
in https://github.com/mlr-org/mlr3misc/blob/main/R/purrr_map.R#L129 as a fix I've used the following, i.e. using setDT()
:
map_dtc = function(.x, .f, ...) {
cols = map(.x, .f, ...)
setDT(unlist(cols, recursive = FALSE))[]
}
perhaps we can do something like the following to accomodate both use-cases:
map_dtc = function(.x, .f, ...) {
cols = map(.x, .f, ...)
j = map_lgl(cols, function(x) !is.null(dim(x)) && !is.null(colnames(x)))
names(cols)[j] = ""
if (inherits(cols[[1L]], "data.table")) {
cols = unlist(cols, recursive = FALSE)
}
setDT(cols)[]
}
There is also PR for a C implementation for cbindlist, but seems to take quite a while till that is merged: https://github.com/Rdatatable/data.table/pull/4370
When the function in
map_dtc
returns adata.table
with many rows,map_dtc
appears to be slower than it needs to be by a factor of about 100.profvis tells me this this is because
name_dots
is called in data.table.