tidyverse / vroom

Fast reading of delimited files
https://vroom.r-lib.org
Other
620 stars 60 forks source link

Error with vroom 1.6.4 on MacOS but not Ubuntu or Windows #519

Open katy-sadowski opened 1 year ago

katy-sadowski commented 1 year ago

In my R package I have several GitHub Actions testthat tests on MacOS (12.7) failing right now with an error like this one:

── Error ('test-listChecks.R:4:3'): listDqChecks works ─────────────────────────
Error in `vroom_(file, delim = delim %||% col_types$delim, col_names = col_names, 
    col_types = col_types, id = id, skip = skip, col_select = col_select, 
    name_repair = .name_repair, na = na, quote = quote, trim_ws = trim_ws, 
    escape_double = escape_double, escape_backslash = escape_backslash, 
    comment = comment, skip_empty_rows = skip_empty_rows, locale = locale, 
    guess_max = guess_max, n_max = n_max, altrep = vroom_altrep(altrep), 
    num_threads = num_threads, progress = progress)`: bad value
Backtrace:
    ▆
 1. └─DataQualityDashboard::listDqChecks() at test-listChecks.R:4:2
 2.   └─readr::read_csv(...)
 3.     └─vroom::vroom(...)
 4.       └─vroom:::vroom_(...)

The same tests pass on Ubuntu, which is also using vroom 1.6.4. For some reason my Windows tests are staying on vroom 1.6.3 in GitHub Actions but when I run locally on Windows with 1.6.4 they work fine. Same tests were passing on MacOS a couple days ago on vroom 1.6.3. I tried specifying vroom <= 1.6.3 in my DESCRIPTION file but got a dependency resolution error on Mac and Ubuntu (not super familiar with how to approach such issues, though, so it's possible I did something wrong there).

I hope this is enough info for you all. Please let me know if not and I'd be happy to provide more detail. Thanks!

jennybc commented 1 year ago

vroom's GitHub Actions checks are all green atm and that includes macOS 12.7:

https://github.com/tidyverse/vroom/actions/runs/6381873068/job/17319275234

So that makes me wonder if there's something specific to your usage or GHA or ???

Can you link to the package and the GHA logs?

katy-sadowski commented 1 year ago

Thanks for your reply! The package is: https://github.com/OHDSI/DataQualityDashboard and GHA logs are here: https://github.com/OHDSI/DataQualityDashboard/actions/runs/6410263439/job/17404120664.

However, I've looked further into this and have some more information:

# dummy test that fails when run as part of R CMD Check on MacOS
test_that("blah", {
  blahh <- readr::read_csv(I("x,y\n1,2\n3,4"))
  expect_true(length(blahh) == 2)
})
# dummy test failure
Error in `vroom_(file, delim = delim %||% col_types$delim, col_names = col_names, 
    col_types = col_types, id = id, skip = skip, col_select = col_select, 
    name_repair = .name_repair, na = na, quote = quote, trim_ws = trim_ws, 
    escape_double = escape_double, escape_backslash = escape_backslash, 
    comment = comment, skip_empty_rows = skip_empty_rows, locale = locale, 
    guess_max = guess_max, n_max = n_max, altrep = vroom_altrep(altrep), 
    num_threads = num_threads, progress = progress)`: bad value
Backtrace:
    ▆
 1. └─readr::read_csv(I("x,y\n1,2\n3,4")) at test-listChecks.R:10:2
 2.   └─vroom::vroom(...)
 3.     └─vroom:::vroom_(...)

btw, I am on the latest version of readr, 2.1.4

Thoughts?

Shians commented 1 year ago

Also having a similar issue

> read_tsv("/var/folders/3p/qh68nr3j5h3054n7llf_9l1r00025j/T/RtmphWTjFR/B6Cast_Prom_1_bl6.tsv")
Error in vroom_(file, delim = delim %||% col_types$delim, col_names = col_names,  :               
  bad value
$ head /var/folders/3p/qh68nr3j5h3054n7llf_9l1r00025j/T/RtmpQIfXAu/B6Cast_Prom_1_bl6.txt
chr pos total   methylated
chr11   101463573   4   1
chr11   101463632   4   2
chr11   101463692   4   1
chr11   101463734   4   1
chr11   101463823   4   2
chr11   101463834   4   1
chr11   101463840   4   1
chr11   101463848   4   1
chr11   101463874   4   2
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.5.1
...
 [90] vroom_1.6.4
...
 [127] readr_2.1.4   

Cannot be consistently reproduced, always broken inside checks while compiling vignette, always always works outside of checking environment.

pitkant commented 1 year ago

I also have the same "bad value" issue as above, I had to downgrade from 1.6.4 to 1.6.3 to get things working again.

> sessionInfo()
R version 4.3.1 Patched (2023-08-16 r84998)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.0
MarekGierlinski commented 1 year ago

Same here:

x <- readr::read_tsv("file.txt")
Error in vroom_(file, delim = delim %||% col_types$delim, col_names = col_names,  : 
  bad value

Where the file can be anything. I cannot consistently reproduce the error easily, it happens after doing a bit of work in the environment, so it must depend on other factors. A package I am preparing for Bioconductor consistently fails building on MacOS, but passes on Linux.

Just like @pitkant, it only happens with 1.6.4. Downgrading to 1.6.3 alleviates the issue.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6
MarekGierlinski commented 1 year ago

OK, I think I have managed to create a reproducible example:

df <- data.frame(x = rnorm(100), y = rnorm(100))
write.table(df, "test.tsv", sep = "\t", row.names = FALSE)

library(readr)
library(BiocFileCache)
#> Loading required package: dbplyr

bfc <- BiocFileCache::BiocFileCache("./cache", ask = FALSE)

x <- readr::read_tsv("test.tsv")
#> Error in vroom_(file, delim = delim %||% col_types$delim, col_names = col_names, : bad value

In this case initialising BiocFileCache causes read_tsv to trip over with the vroom error. This is one weird bug, I'm not even using bfc after it has been created. I have no idea how this affects seemingly independent read_tsv.

I can also confirm the error appears only with vroom 1.6.4. The above code works well with vroom 1.6.3.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] BiocFileCache_2.8.0 dbplyr_2.3.3        readr_2.1.4

loaded via a namespace (and not attached):
 [1] crayon_1.5.2     vctrs_0.6.3      httr_1.4.7       cli_3.6.1
 [5] rlang_1.1.1      DBI_1.1.3        purrr_1.0.2      generics_0.1.3
 [9] glue_1.6.2       bit_4.0.5        hms_1.1.3        fansi_1.0.4
[13] filelock_1.0.2   tibble_3.2.1     tzdb_0.4.0       fastmap_1.1.1
[17] lifecycle_1.0.3  memoise_2.0.1    compiler_4.3.1   dplyr_1.1.2
[21] RSQLite_2.3.1    blob_1.2.4       pkgconfig_2.0.3  R6_2.5.1
[25] tidyselect_1.2.0 utf8_1.2.3       parallel_4.3.1   vroom_1.6.4
[29] curl_5.0.2       pillar_1.9.0     magrittr_2.0.3   withr_2.5.0
[33] tools_4.3.1      bit64_4.0.5      cachem_1.0.8
DavisVaughan commented 1 year ago

Refining @MarekGierlinski's reprex:

# RSQLite does use cpp11
con <- RSQLite::datasetsDb()

x <- vroom::vroom(
  I("a\tb\n1.0\t2.0"),
  delim = "\t"
)
DavisVaughan commented 1 year ago

One quick fix seems to be to build RSQLite from source with pak::pak("r-dbi/RSQLite") (also make sure you have CRAN cpp11 before doing this). That makes the problem disappear.

So I'm thinking that somehow CRAN RSQLite (built with "old" cpp11, not sure which exact version, but pre-0.4.6) is somehow incompatible with CRAN vroom (built with "new" cpp11 0.4.6)

DavisVaughan commented 1 year ago

lldb backtrace. Seeing release(this=<unavailable>, cell=<unavailable>) at protect.hpp:300:5 makes me think it is another issue with the preserve list structure, kind of like https://github.com/r-lib/cpp11/issues/330 in a way, but this is a little different.

vroom compilation units should be getting their own preserve list due to https://github.com/r-lib/cpp11/pull/331 so I'm not entirely sure how an "old" version of cpp11 that came with RSQLite could be affecting this

Exact failure line https://github.com/tidyverse/vroom/blob/7eef177b3f41f33d55ac1af36433a16c29aa4750/src/columns.h#L271C24-L271C24

Ends up failing in the ~sexp method when doing a release() of the character vector created by converting the initializer list to a character vector

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
  * frame #0: 0x00000001011609ec libR.dylib`Rf_error(format="bad value") at errors.c:961:26 [opt]
    frame #1: 0x00000001011c5ed4 libR.dylib`SETCAR(x=0x000000015200ece0, y=0x0000000132202580) at memory.c:4285:2 [opt]
    frame #2: 0x0000000103a14050 vroom.so`cpp11::as_sexp(std::initializer_list<cpp11::r_string>) [inlined] cpp11::$_0::release(this=<unavailable>, cell=<unavailable>) at protect.hpp:300:5 [opt]
    frame #3: 0x0000000103a14014 vroom.so`cpp11::as_sexp(std::initializer_list<cpp11::r_string>) [inlined] cpp11::sexp::~sexp(this=0x000000016f1792d0) at sexp.hpp:58:23 [opt]
    frame #4: 0x0000000103a14014 vroom.so`cpp11::as_sexp(std::initializer_list<cpp11::r_string>) [inlined] cpp11::sexp::~sexp(this=0x000000016f1792d0) at sexp.hpp:58:11 [opt]
    frame #5: 0x0000000103a14014 vroom.so`cpp11::as_sexp(il=initializer_list<cpp11::r_string> @ 0x000000016f1792e8) at r_string.hpp:69:1 [opt]
    frame #6: 0x0000000103a09948 vroom.so`vroom::create_columns(std::__1::shared_ptr<vroom::index_collection>, cpp11::sexp, cpp11::sexp, cpp11::sexp, cpp11::sexp, SEXPREC*, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >&, cpp11::r_vector<cpp11::r_string>, cpp11::r_vector<SEXPREC*>, unsigned long, unsigned long, cpp11::external_pointer<std::__1::shared_ptr<vroom_errors>, &void cpp11::default_deleter<std::__1::shared_ptr<vroom_errors> >(std::__1::shared_ptr<vroom_errors>*)>, unsigned long) [inlined] cpp11::writable::r_vector<cpp11::r_string>::r_vector(this=0x000000016f1793e8, il=initializer_list<cpp11::r_string> @ 0x0000600000cbd550) at strings.hpp:117:33 [opt]
    frame #7: 0x0000000103a0993c vroom.so`vroom::create_columns(std::__1::shared_ptr<vroom::index_collection>, cpp11::sexp, cpp11::sexp, cpp11::sexp, cpp11::sexp, SEXPREC*, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >&, cpp11::r_vector<cpp11::r_string>, cpp11::r_vector<SEXPREC*>, unsigned long, unsigned long, cpp11::external_pointer<std::__1::shared_ptr<vroom_errors>, &void cpp11::default_deleter<std::__1::shared_ptr<vroom_errors> >(std::__1::shared_ptr<vroom_errors>*)>, unsigned long) [inlined] cpp11::writable::r_vector<cpp11::r_string>::r_vector(this=0x000000016f1793e8, il=<unavailable>) at strings.hpp:117:68 [opt]
    frame #8: 0x0000000103a0993c vroom.so`vroom::create_columns(idx=std::__1::shared_ptr<vroom::index_collection>::element_type @ 0x0000600000d20678 strong=6 weak=2, col_names=<unavailable>, col_types=<unavailable>, col_select=(data_ = <read memory from 0xc failed (0 of 8 bytes read)>, preserve_token_ = <read memory from 0x14 failed (0 of 8 bytes read)>), name_repair=(data_ = 0x0000000103a11100, preserve_token_ = 0x0000000103a111c0), id=0x000000015200ece0, filenames=<unavailable>, na=cpp11::strings @ 0x000000016f179780, locale=cpp11::list @ 0x000000016f1796e0, altrep=1023, guess_max=100, errors=external_pointer<std::__1::shared_ptr<vroom_errors>, &cpp11::default_deleter> @ 0x000000016f1796f8, num_threads=12)>, unsigned long) at columns.h:271:19 [opt]
    frame #9: 0x0000000103a0720c vroom.so`vroom_(inputs=0x000000016f1799b8, delim=0x0000000122cf3050, quote='"', trim_ws=<unavailable>, escape_double=true, escape_backslash=<unavailable>, comment="", skip_empty_rows=<unavailable>, skip=0, n_max=-1, progress=<unavailable>, col_names=0x000000016f1799a8, col_types=(data_ = 0x000000016f179998, preserve_token_ = 0x000000016f179988), col_select=(data_ = 0x000000016f179988, preserve_token_ = 0x000000016f179978), name_repair=(data_ = 0x000000016f179978, preserve_token_ = 0x000000015200ece0), id=0x000000015200ece0, na=0x000000016f179950, locale=0x000000016f179928, guess_max=100, num_threads=12, altrep=1023) at vroom.cc:79:10 [opt]
    frame #10: 0x00000001039e48f4 vroom.so`::_vroom_vroom_(inputs=<unavailable>, delim=0x0000000122cf3050, quote=<unavailable>, trim_ws=<unavailable>, escape_double=<unavailable>, escape_backslash=<unavailable>, comment=<unavailable>, skip_empty_rows=<unavailable>, skip=0x0000000105889858, n_max=0x0000000105c73d30, progress=0x0000000152010ce0, col_names=0x0000000105889890, col_types=0x0000000105726dc8, col_select=0x00000001057c2580, name_repair=0x0000000105889510, id=0x000000015200ece0, na=0x0000000104e12a08, locale=0x000000010528a3c8, guess_max=0x00000001058895b8, num_threads=0x0000000105d967f0, altrep=0x00000001045620a8) at cpp11.cpp:62:27 [opt]
    frame #11: 0x0000000101138edc libR.dylib`R_doDotCall(fun=<unavailable>, nargs=21, cargs=0x000000016f17c0c8, call=0x00000001057ce740) at dotcode.c:0 [opt]
    frame #12: 0x00000001011391bc libR.dylib`do_dotcall(call=0x00000001057ce740, op=<unavailable>, args=<unavailable>, env=<unavailable>) at dotcode.c:1551:11 [opt]
    frame #13: 0x000000010116ea24 libR.dylib`bcEval(body=0x00000001057ce7e8, rho=<unavailable>, useCache=<unavailable>) at eval.c:7446:14 [opt]
    frame #14: 0x0000000101167048 libR.dylib`Rf_eval(e=0x00000001057ce7e8, rho=0x00000001057d1348) at eval.c:1013:8 [opt]
    frame #15: 0x0000000101183ccc libR.dylib`R_execClosure(call=0x00000001056d1230, newrho=0x00000001057d1348, sysparent=<unavailable>, rho=<unavailable>, arglist=<unavailable>, op=0x00000001057cf038) at eval.c:0 [opt]
    frame #16: 0x000000010118254c libR.dylib`Rf_applyClosure(call=0x00000001056d1230, op=0x00000001057cf038, arglist=0x00000001057ce1c8, rho=0x00000001056d4820, suppliedvars=<unavailable>) at eval.c:2113:16 [opt]
    frame #17: 0x000000010116e204 libR.dylib`bcEval(body=0x00000001056cfdc8, rho=<unavailable>, useCache=<unavailable>) at eval.c:7414:12 [opt]
    frame #18: 0x0000000101167048 libR.dylib`Rf_eval(e=0x00000001056cfdc8, rho=0x00000001056d4820) at eval.c:1013:8 [opt]
    frame #19: 0x0000000101183ccc libR.dylib`R_execClosure(call=0x0000000122b01f60, newrho=0x00000001056d4820, sysparent=<unavailable>, rho=<unavailable>, arglist=<unavailable>, op=0x00000001056ccd60) at eval.c:0 [opt]
    frame #20: 0x000000010118254c libR.dylib`Rf_applyClosure(call=0x0000000122b01f60, op=0x00000001056ccd60, arglist=0x00000001056d4e08, rho=0x0000000152047b88, suppliedvars=<unavailable>) at eval.c:2113:16 [opt]
    frame #21: 0x000000010116731c libR.dylib`Rf_eval(e=0x0000000122b01f60, rho=0x0000000152047b88) at eval.c:1140:12 [opt]
    frame #22: 0x00000001011869a8 libR.dylib`do_set(call=<unavailable>, op=0x000000015200ce08, args=0x0000000122b01ef0, rho=0x0000000152047b88) at eval.c:3250:8 [opt]
    frame #23: 0x0000000101167248 libR.dylib`Rf_eval(e=0x0000000122b01eb8, rho=0x0000000152047b88) at eval.c:1092:12 [opt]
    frame #24: 0x00000001011bb3b4 libR.dylib`Rf_ReplIteration(rho=0x0000000152047b88, savestack=<unavailable>, browselevel=<unavailable>, state=0x000000016f17e130) at main.c:262:2 [opt]
    frame #25: 0x00000001011bc928 libR.dylib`R_ReplConsole(rho=0x0000000152047b88, savestack=0, browselevel=0) at main.c:314:11 [opt]
    frame #26: 0x00000001011bc864 libR.dylib`run_Rmainloop at main.c:1200:5 [opt]
    frame #27: 0x00000001011bc9d0 libR.dylib`Rf_mainloop at main.c:1207:5 [opt]
    frame #28: 0x0000000100c83ea0 R`main + 32
    frame #29: 0x00000001a03bff28 dyld`start + 2236
mahekvirani commented 1 year ago

I was also getting the same bad value vroom error as mentioned above. I was able to work around it by adding the command readr::local_edition(1) before the lines that were causing the error as mentioned under fenr Bioconductor/Contributions#3017.