Open nograpes opened 2 years ago
I see this too. Slighty improve tweaked reprex below:
library(vroom)
times <- c("31JAN2015:18:47:49", "31JAN2015:19:35:09", "31JAN2015:21:10:28", "31JAN2015:20:02:19", "31JAN2015:18:04:39", "31JAN2015:19:58:32", "31JAN2015:18:07:25", "31JAN2015:18:30:29", "31JAN2015:19:54:57", "31JAN2015:20:17:13", "31JAN2015:19:44:46", "31JAN2015:20:30:18", "31JAN2015:20:01:47", "31JAN2015:20:35:36", "31JAN2015:20:21:47", "31JAN2015:18:39:52", "31JAN2015:20:51:51", "31JAN2015:21:26:30", "31JAN2015:21:27:06", "31JAN2015:20:07:45", "31JAN2015:22:02:21", "31JAN2015:20:35:48", "31JAN2015:20:23:30", "31JAN2015:21:10:12", "31JAN2015:22:05:21", "31JAN2015:20:26:31", "31JAN2015:22:16:10", "31JAN2015:22:11:14", "01FEB2015:01:08:45")
file <- tempfile()
write.csv(data.frame(a = times), file, row.names = FALSE, fileEncoding = "latin1")
probs <- function() {
test <- vroom(
file,
delim = ",",
progress = FALSE,
num_threads = 2,
locale = locale(encoding = "latin1"),
col_types = cols(a = col_datetime(format = "%d%b%Y:%H:%M:%OS"))
)
problems(test)
}
first <- suppressWarnings(replicate(1000, probs(), simplify = FALSE))
dplyr::bind_rows(first, .id = "id")
#> # A tibble: 14 × 6
#> id row col expected actual file
#> <chr> <int> <int> <chr> <chr> <chr>
#> 1 79 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 2 85 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 3 132 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 4 133 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 5 243 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 6 459 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 7 470 13 1 date like %d%b%Y:%H:%M:%OS 31JAN2015:20:30:18 /private/tmp…
#> 8 552 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 9 592 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 10 680 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 11 706 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 12 747 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 13 866 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
#> 14 881 30 1 date like %d%b%Y:%H:%M:%OS 01FEB2015:01:08:45 /private/tmp…
Created on 2023-08-01 with reprex v2.0.2
It's weird that the encoding is important for the reprex, giving that it's a pure ASCII file.
Thank you for the lovely package. When using
vroom
to parse a file with datetime values, with thelatin1
encoding and more than one thread, randomly, but very rarely, it will report that certain times are not formatted as expected.I have tried to make this example minimal, but because it isn't deterministic, I have had to guess at the size of data and number of replications needed to consistently generate at least one error. Below is code for the bug reproduction.
I would expect that code to not fail on any read. Even if there was an error, I would expect it to be the same error every time. But on all machines I have tested you will get some reads that fail on random rows, like:
I have recreated this issue on Windows and Linux with vroom 1.5.7, with R version 4.1.3. I have also recreated this issue with the development version of vroom (1.6.0.9000). I also tested on R 3.6.3 on Linux.