tidyverse / vroom

Fast reading of delimited files
https://vroom.r-lib.org
Other
621 stars 60 forks source link

skip _empty_rows = TRUE fails in vroom_fwf #501

Open richardjtelford opened 1 year ago

richardjtelford commented 1 year ago

I'm having problems that appear to stem from vroom_fwf, and hence readr::read_fwf ignoring the skip_empty_rows argument.

Here is a minimal working example.

f <- "a b c

1 2 3

1 2 3"

readr::read_fwf(f, skip_empty_rows = TRUE, show_col_types = FALSE)
#> # A tibble: 5 × 3
#>   X1    X2    X3   
#>   <chr> <chr> <chr>
#> 1 a     b     c    
#> 2 <NA>  <NA>  <NA> 
#> 3 1     2     3    
#> 4 <NA>  <NA>  <NA> 
#> 5 1     2     3

vroom::vroom_fwf(I(f), skip_empty_rows = TRUE, show_col_types = FALSE)
#> # A tibble: 5 × 3
#>   X1    X2    X3   
#>   <chr> <chr> <chr>
#> 1 a     b     c    
#> 2 <NA>  <NA>  <NA> 
#> 3 1     2     3    
#> 4 <NA>  <NA>  <NA> 
#> 5 1     2     3

Created on 2023-06-23 with reprex v2.0.2

I was also expecting the column names to be a, b, c rather than X1, X2, X3, but that does not affect me now.

readr::read_table gives the expected result. Unfortunately I cannot easily switch to this as I am not calling the function directly but via another package.

hadley commented 1 year ago

FWIW that looks more like what we call tabular data, and read_table() does a better job with it:

f <- "a b c

1 2 3

1 2 3"

readr::read_table(f, skip_empty_rows = TRUE)
#> # A tibble: 2 × 3
#>       a     b     c
#>   <dbl> <dbl> <dbl>
#> 1     1     2     3
#> 2     1     2     3

Created on 2023-08-01 with reprex v2.0.2