Open strazto opened 4 years ago
What happens when you run your data frame on the minimal example below?
library(RSQLite)
library(storr)
# Set up the DBI storr.
drv <- dbDriver("SQLite")
con <- dbConnect(drv, "db_file.db")
s <- storr::storr_dbi(
"tbl_data",
"tbl_keys",
con
)
# Load or create your data.
data <- data.frame(x = 1) # Different for you.
s$set("data", data) # Should reproduce the error.
s$get("data")
#> x
#> 1 1
Created on 2020-03-05 by the reprex package (v0.3.0)
What happens when you run your data frame on the minimal example below?
I'll give this a try now, thanks @wlandau
As you said, the error was reproduced.
library(RSQLite)
library(storr)
# Set up the DBI storr.
drv <- dbDriver("SQLite")
con <- dbConnect(drv, "db_file.db")
s <- storr::storr_dbi(
"tbl_data",
"tbl_keys",
con
)
# Load or create your data.
data <- readr::read_rds("~/Documents/workflows/bulk_journey_analysis_base_combo.rds")
s$set("data", data) # Should reproduce the error.
s$get("data")
#> Error: NOT NULL constraint failed: tbl_data.value
I dug into the stack trace the other day and investigated this - I'll have to do it again since some time has since passed - From what I saw, the dataframe seemed to pass through storr
relatively intact- it was at some DBI method dispatch where all of a sudden thing were not as they should have been.
I could swear I raised an issue or documented it somewhere but now I can't find where, I might have to find it again
@wlandau
In https://github.com/ropensci/drake/issues/1201#issuecomment-595398007 you asked:
Also, is there a reason you commented out
format = "fst"
? Seems like it would avoid the error and run much faster.
Getting fst
to compile on my HPC has proved painful.
Okay, debugging in the stack trace now -
traceback()
19: stop(list(message = "NOT NULL constraint failed: tbl_data.value",
call = NULL, cppstack = NULL))
18: result_bind(res@ptr, params)
17: db_bind(res, as.list(params), ..., allow_named_superset = FALSE)
16: dbBind(rs, params)
15: dbBind(rs, params)
14: .local(conn, statement, ...)
13: dbSendQuery(conn, statement, ...)
12: dbSendQuery(conn, statement, ...)
11: dbSendStatement(conn, statement, ...)
10: dbSendStatement(conn, statement, ...)
9: DBI::dbExecute(self$con, self$sql$set_object, dat)
8: DBI::dbExecute(self$con, self$sql$set_object, dat)
7: self$driver$set_object(hash, value_send)
6: self$set_value(value, use_cache)
5: s$set("data", data) at test_cache_break.R#13
4: eval(ei, envir)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("~/work/repos/SPEED-EXTRACT.pipeline/inst/investigations/test_cache_break.R",
echo = TRUE)
At the point of RSQLite::result_bind
, I can perform the following:
ls()
#> [1] "params" "res"
params
is a large list, containing a small hash, and what appears to be a serialized version of my df (Raw).
str(params)
#> List of 2
#> $ : chr "ac42c7b24b2ed64c449dc6de4e1590a2"
#> $ :List of 1
#> ..$ : raw [1:1485753874] 58 0a 00 00 ...
res
is an externalptr
with some value.
The only statement in result_bind
is really just forwarding onto a C++ call,
invisible(.Call(`_RSQLite_result_bind`, res, params))
Originally posted here: https://github.com/ropensci/drake/issues/1201
Overview
Within my drake workflow, when trying to store a particular dataframe, I keep getting the same error:
This always occurs at the stage of attempting to store my dataframe
With some digging, I am able to find the name "datatable" wrt my cache:
After reviewing source for
driver_dbi
, I notice where this NOT NULL constraint is set. I also notice the definitions forset_object
.I wonder if anyone has any idea how this datatable.value might be taken as NULL?
Lines
NOT NULL constraint set here:
https://github.com/richfitz/storr/blob/0c64f1efb0574b46059a962de9d264a74d448843/R/driver_dbi.R#L171-L173
Defining
set_object
SQL Call:https://github.com/richfitz/storr/blob/0c64f1efb0574b46059a962de9d264a74d448843/R/driver_dbi.R#L514-L515
Defining
set_object
method for driver_dbi:https://github.com/richfitz/storr/blob/0c64f1efb0574b46059a962de9d264a74d448843/R/driver_dbi.R#L305-L312