r-dbi / RMariaDB

An R interface to MariaDB
https://rmariadb.r-dbi.org
Other
128 stars 40 forks source link

Potential Memory Leak in 1.3.0? #309

Closed GraphZal closed 10 months ago

GraphZal commented 11 months ago

We have noticed a significant increase of memory consumption in 1.3.0 compared to 1.2.2, especially when reading large tables. That memory also isn't freed after closing and destroying the database connection. For example, reading a 175MB table (data length reported by MySQL Workbench) results in ~600MB additional memory usage each time that table is read, i.e. reading it twice adds 1.2GB memory usage. Reverting to RMariaDB version 1.2.2 stops the issue from showing up.

Our Database server runs on MySQL 8.0.34, if that's important.

I'm not quite sure how to provide a useful reprex, considering a database connection is involved. If there's something I can do that makes it more useful, please don't hesitate to ask for it!

conn2 <- DBI::dbConnect(
  drv = RMariaDB::MariaDB(),
  dbname = "<scrubbed>",
  host = "<scrubbed>",
  port = <scrubbed>,
  client.flag = 0,
  username = "<scrubbed>",
  password = "<scrubbed>"
  )
gc()
#>           used (Mb) gc trigger  (Mb) max used (Mb)
#> Ncells  982745 52.5    1911970 102.2  1437825 76.8
#> Vcells 1694914 13.0    8388608  64.0  2839637 21.7
tab <- DBI::dbReadTable(conn2, "<some_large_table_here>")
gc()
#>            used  (Mb) gc trigger  (Mb) max used  (Mb)
#> Ncells   986156  52.7    1911970 102.2  1437825  76.8
#> Vcells 80231910 612.2  105323487 803.6 87372344 666.6
tab <- DBI::dbReadTable(conn2, "<some_large_table_here>")
gc()
#>             used   (Mb) gc trigger   (Mb)  max used   (Mb)
#> Ncells    986395   52.7    1911970  102.2   1437825   76.8
#> Vcells 155208358 1184.2  198340906 1513.3 162327962 1238.5
tab <- DBI::dbReadTable(conn2, "<some_large_table_here>")
gc()
#>             used   (Mb) gc trigger   (Mb)  max used   (Mb)
#> Ncells    986634   52.7    1911970  102.2   1437825   76.8
#> Vcells 230184808 1756.2  296869810 2265.0 240858902 1837.7

With 1.2.2:

conn2 <- DBI::dbConnect(
  drv = RMariaDB::MariaDB(),
  dbname = "<scrubbed>",
  host = "<scrubbed>",
  port = <scrubbed>,
  client.flag = 0,
  username = "<scrubbed>",
  password = "<scrubbed>"
  )
gc()
#>           used (Mb) gc trigger  (Mb) max used (Mb)
#> Ncells 1019389 54.5    2016667 107.8  1437825 76.8
#> Vcells 1758110 13.5    8388608  64.0  3082381 23.6
tab <- DBI::dbReadTable(conn2, "<some_large_table_here>")
gc()
#>            used  (Mb) gc trigger  (Mb) max used  (Mb)
#> Ncells  1021670  54.6    2016667 107.8  1437825  76.8
#> Vcells 17757486 135.5   55061346 420.1 57938202 442.1
tab <- DBI::dbReadTable(conn2, "<some_large_table_here>")
gc()
#>            used  (Mb) gc trigger  (Mb) max used  (Mb)
#> Ncells  1021684  54.6    2016667 107.8  1437825  76.8
#> Vcells 17757517 135.5   69716149 531.9 73918443 564.0
tab <- DBI::dbReadTable(conn2, "<some_large_table_here>")
gc()
#>            used  (Mb) gc trigger  (Mb) max used  (Mb)
#> Ncells  1021698  54.6    2016667 107.8  1437825  76.8
#> Vcells 17757548 135.5   55772920 425.6 73918443 564.0
krlmlr commented 11 months ago

Thanks, confirmed:

library(RMariaDB)
con <- mariadbDefault()

dbWriteTable(con, "mtcars", mtcars, temporary = TRUE)

gc()
#>           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells  900713 48.2    1449517 77.5         NA  1449517 77.5
#> Vcells 1548618 11.9    8388608 64.0      24576  2679154 20.5

for (i in 1:10) {
  for (i in 1:1000) {
    dbReadTable(con, "mtcars")
  }

  print(gc())
}
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used (Mb)
#> Ncells  979023 52.3    1893577 101.2         NA  1449517 77.5
#> Vcells 2067132 15.8    8388608  64.0      24576  2679154 20.5
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells  998410 53.4    1893577 101.2         NA  1893577 101.2
#> Vcells 2453800 18.8    8388608  64.0      24576  2849624  21.8
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells 1016990 54.4    1893577 101.2         NA  1893577 101.2
#> Vcells 2839123 21.7    8388608  64.0      24576  3252164  24.9
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells 1036049 55.4    1893577 101.2         NA  1893577 101.2
#> Vcells 3225242 24.7    8388608  64.0      24576  3620045  27.7
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells 1055108 56.4    1893577 101.2         NA  1893577 101.2
#> Vcells 3611353 27.6    8388608  64.0      24576  3989247  30.5
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells 1074179 57.4    1893577 101.2         NA  1893577 101.2
#> Vcells 3997484 30.5    8388608  64.0      24576  4358324  33.3
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells 1093253 58.4    1893577 101.2         NA  1893577 101.2
#> Vcells 4383622 33.5   10146329  77.5      24576  4727942  36.1
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells 1112300 59.5    1893577 101.2         NA  1893577 101.2
#> Vcells 4769716 36.4   10146329  77.5      24576  5093708  38.9
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells 1131377 60.5    1893577 101.2         NA  1893577 101.2
#> Vcells 5155860 39.4   10146329  77.5      24576  5504430  42.0
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells 1150016 61.5    1893577 101.2         NA  1893577 101.2
#> Vcells 5541274 42.3   12255594  93.6      24576  5886955  45.0

dbDisconnect(con)

Created on 2023-10-13 with reprex v2.0.2

krlmlr commented 11 months ago

From git bisect:

The first bad commit could be any of:

Expanding this to a commit range that can be built on my machine gives 3e54aeefd7c651633bb830f58b0444735f63b902...39e52c68067fc069580eca9570523c5b5354b74a. The first commit in this range breaks the builds on my system, perhaps we could narrow this down to an even smaller changeset, but this might be unnecessary.

GraphZal commented 11 months ago

Can confirm that 3e54aeefd7c651633bb830f58b0444735f63b902 is good. I was able to build 1735206b04013259825e80da40d256cff6bf0ab0 and it has the problem. Everything in-between those two failed to build for me.

krlmlr commented 10 months ago

Thanks. Breaking down the first commit to understand what's happening here.

krlmlr commented 10 months ago

Smaller breaking commit: 0b6d64a52de902e358b5de423954bd2d5be66ad8

🤯

krlmlr commented 10 months ago

@Antonov548: why would 0b6d64a52de902e358b5de423954bd2d5be66ad8 lead to a memory leak? Can you please take a look?

See https://github.com/r-dbi/RMariaDB/issues/309#issuecomment-1760764345 for a reproducible example.

krlmlr commented 10 months ago

In particular, 2307e13801ba4ac99ec652b2c775597f4574328c appears to fix it. Does this mean that a function that returns a writable list leads to a memory leak?

krlmlr commented 10 months ago

I tried to replicate, to no avail so far. Tried many variants of the code below.

@Antonov548: please let me know if you find something. Bottom line until then: avoid declaring functions that return cpp11::writable::* .

cpp11::cpp_source(quiet = FALSE, cxx_std = "CXX17", code = "
#include <cpp11.hpp>

cpp11::writable::list test_list() {
  cpp11::writable::list x(10000);
  x[0] = Rf_allocVector(REALSXP, 10000);
  return (SEXP)x;
}

[[cpp11::register]]
SEXP test_list3() {
  auto out = test_list();
  out[1] = Rf_allocVector(INTSXP, 10000);
  return out;
}
")
#> ℹ 1 functions decorated with [[cpp11::register]]
#> using C++ compiler: ‘Apple clang version 15.0.0 (clang-1500.0.40.1)’
#> using C++17
#> using SDK: ‘MacOSX14.0.sdk’
#> ccache clang++ -arch arm64 -std=gnu++17 -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/include" -DNDEBUG -I'/Users/kirill/Library/R/arm64/4.3/library/cpp11/include'  -I/opt/homebrew/include    -fPIC  -O0 -g -Wmacro-redefined -Wno-everything -c /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpcSwXd8/file16f823478bcea/src/code_16f8262bac18f.cpp -o /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpcSwXd8/file16f823478bcea/src/code_16f8262bac18f.o
#> ccache clang++ -arch arm64 -std=gnu++17 -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/include" -DNDEBUG -I'/Users/kirill/Library/R/arm64/4.3/library/cpp11/include'  -I/opt/homebrew/include    -fPIC  -O0 -g -Wmacro-redefined -Wno-everything -c /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpcSwXd8/file16f823478bcea/src/cpp11.cpp -o /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpcSwXd8/file16f823478bcea/src/cpp11.o
#> ccache clang++ -arch arm64 -std=gnu++17 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib -L/opt/homebrew/lib -o /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpcSwXd8/file16f823478bcea/src/code_16f8262bac18f.so /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpcSwXd8/file16f823478bcea/src/code_16f8262bac18f.o /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpcSwXd8/file16f823478bcea/src/cpp11.o -F/Library/Frameworks/R.framework/Versions/4.3-arm64 -framework R -Wl,-framework -Wl,CoreFoundation
#> ld: warning: -single_module is obsolete
#> ld: warning: -multiply_defined is obsolete

gc()
#>           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells  842187 45.0    1449470 77.5         NA  1449470 77.5
#> Vcells 1504970 11.5    8388608 64.0      24576  2321117 17.8

for (i in 1:3) {
  for (j in 1:10000) {
    test_list3()
  }
  print(gc())
}
#>           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells  842784 45.1    1449470 77.5         NA  1449470 77.5
#> Vcells 1506676 11.5    8388608 64.0      24576  8385763 64.0
#>           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells  842784 45.1    1449470 77.5         NA  1449470 77.5
#> Vcells 1506699 11.5    8388608 64.0      24576  8385763 64.0
#>           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells  842785 45.1    1449470 77.5         NA  1449470 77.5
#> Vcells 1506711 11.5    8388608 64.0      24576  8385763 64.0

Created on 2023-10-23 with reprex v2.0.2

krlmlr commented 10 months ago

Even tried the following, with https://github.com/r-lib/cpp11/pull/337, to no avail:

f1 <- "
#include <cpp11.hpp>

cpp11::writable::list test_list() {
  cpp11::writable::list x(10000);
  x[0] = Rf_allocVector(REALSXP, 10000);
  return x;
}
"

writeLines(f1, "f1.cpp")

f2 <- "
#include <cpp11.hpp>

cpp11::writable::list test_list();

[[cpp11::register]]
SEXP test_list3() {
  auto out = test_list();
  out[1] = Rf_allocVector(INTSXP, 10000);
  return out;
}
"

writeLines(f2, "f2.cpp")

cpp11::cpp_source(file = c("f1.cpp", "f2.cpp"), quiet = FALSE, cxx_std = "CXX17")
#> ℹ 1 functions decorated with [[cpp11::register]]
#> using C++ compiler: ‘Apple clang version 15.0.0 (clang-1500.0.40.1)’
#> using C++17
#> using SDK: ‘MacOSX14.0.sdk’
#> ccache clang++ -arch arm64 -std=gnu++17 -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/include" -DNDEBUG -I'/Users/kirill/Library/R/arm64/4.3/library/cpp11/include'  -I/opt/homebrew/include    -fPIC  -O0 -g -Wmacro-redefined -Wno-everything -c /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpYh4lTi/file179317a1325f6/src/f1.cpp -o /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpYh4lTi/file179317a1325f6/src/f1.o
#> ccache clang++ -arch arm64 -std=gnu++17 -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/include" -DNDEBUG -I'/Users/kirill/Library/R/arm64/4.3/library/cpp11/include'  -I/opt/homebrew/include    -fPIC  -O0 -g -Wmacro-redefined -Wno-everything -c /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpYh4lTi/file179317a1325f6/src/f2.cpp -o /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpYh4lTi/file179317a1325f6/src/f2.o
#> ccache clang++ -arch arm64 -std=gnu++17 -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/include" -DNDEBUG -I'/Users/kirill/Library/R/arm64/4.3/library/cpp11/include'  -I/opt/homebrew/include    -fPIC  -O0 -g -Wmacro-redefined -Wno-everything -c /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpYh4lTi/file179317a1325f6/src/cpp11.cpp -o /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpYh4lTi/file179317a1325f6/src/cpp11.o
#> ccache clang++ -arch arm64 -std=gnu++17 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib -L/opt/homebrew/lib -o /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpYh4lTi/file179317a1325f6/src/f1.so /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpYh4lTi/file179317a1325f6/src/f1.o /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpYh4lTi/file179317a1325f6/src/f2.o /private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpYh4lTi/file179317a1325f6/src/cpp11.o -F/Library/Frameworks/R.framework/Versions/4.3-arm64 -framework R -Wl,-framework -Wl,CoreFoundation
#> ld: warning: -single_module is obsolete
#> ld: warning: -multiply_defined is obsolete

gc()
#>           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells  844854 45.2    1449328 77.5         NA  1449328 77.5
#> Vcells 1510172 11.6    8388608 64.0      24576  2588324 19.8

for (i in 1:3) {
  for (j in 1:10000) {
    test_list3()
  }
  print(gc())
}
#>           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells  845450 45.2    1449328 77.5         NA  1449328 77.5
#> Vcells 1511858 11.6    8388608 64.0      24576  8387316 64.0
#>           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells  845450 45.2    1449328 77.5         NA  1449328 77.5
#> Vcells 1511881 11.6    8388608 64.0      24576  8387316 64.0
#>           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells  845451 45.2    1449328 77.5         NA  1449328 77.5
#> Vcells 1511893 11.6    8388608 64.0      24576  8387316 64.0

Created on 2023-10-23 with reprex v2.0.2

Antonov548 commented 10 months ago

Hello @krlmlr

I have briefly checked. I remember I already fixed something simlar during migration from Rcpp to cpp11. I didn't find reference in my pull reuqests. But I remember that writable shouldn't be returned from functions.

It's something related to sequence of called constructors and attributes that stored in vector. Before returning writable::list it also calls operator SEXP() which converts list to SEXP and only then again to writable::list.

I can check it deeper, but I remember last time I spend quite a lot of time to understand this.

krlmlr commented 10 months ago

Brilliant, thanks! Found a reprex, will file an issue in cpp11.