r-dbi / RMySQL

Legacy DBI interface for MySQL
http://cran.r-project.org/package=RMySQL
210 stars 107 forks source link

dbWriteTable inserts partial data for large data frames #200

Open ajzz opened 7 years ago

ajzz commented 7 years ago

DB: MySQL 5.6.27 Package version: RMySQL_0.10.11 R version 3.4.0 (2017-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.2 LTS

Steps to reproduce: 1) Write large data frame > 500K rows & 15 columns to table with existing schema (append = TRUE) 2) Typically 300-400K rows get written, but not the entire data frame. Number of rows written each time are variable.

Functioning workaround: Batch the write with new connection per batch eg:

    ## Workaround for MySQL driver issues
    batchsize = 100000

    if (nrow(data) >= batchsize) {
      batch <- split(data, 1:nrow(data) %/% batchsize)

      op <- lapply(batch, function(x) {

      ## Make separate connections for each batch
      conn2 <- RMySQL::dbConnect( ... connectionparams ... )
      RMySQL::dbWriteTable(conn = conn2, 
                           name = tablename, 
                           value = x,
                           append = TRUE,
                           row.names = FALSE)
      DBI::dbDisconnect(conn2)

    })
    } else { ... write normally ...}
ajzz commented 7 years ago

Possible causal factor: temp file write fails silently when working directory storage limits are surpassed.

raggaraluz commented 7 years ago

Is the connection lost after the insert? Just wondering if it is related to #190