Open jangorecki opened 4 years ago
It seems that this issue has impact not only on 1e9_2e0_0_0 data case. The data case which starts just after failure of k=2e0 happened to fail as well at the very beginning.
stdout
# groupby-datatable.R
loading dataset G1_1e9_1e2_0_1
System errno 22 unmapping file: Invalid argument
stderr
Error in fread(src_grp, showProgress = FALSE, stringsAsFactors = TRUE) :
Opened 47.09GB (50558868357 bytes) file ok but could not memory map it. This i
s a 64bit process. There is probably not enough contiguous virtual memory availa
ble.
Execution halted
update: between each benchmark script there is now 15 seconds sleep, that seems to eliminate the impact of previous script to the next one, which is undesired.
This problem has been described in https://bugs.r-project.org/bugzilla/show_bug.cgi?id=18003
Script is getting stuck (and eventually killed after exceeding timeout) due to R's
gc
taking too much time. Without the timeout script is killed by OS after around 6 hours. Even if it could finish at some point behaviour is not acceptable. Package-agnostic reproducible example should be produced and submitted to R-devel to investigate behavior.This produces 1e9 rows, K=2 (unbalanced) dataset
and then running data.table and dplyr groupby script on 125GB mem machine will take us to this issue. Note that recent dplyr will fail even sooner due to #152 so the older one should be used instead.