tidyverse / vroom

Fast reading of delimited files
https://vroom.r-lib.org
Other
621 stars 60 forks source link

Track skips for problems #448

Open sbearrows opened 2 years ago

sbearrows commented 2 years ago

Closes #295

We want to track skipped lines so that problems() reporting is more accurate. Currently, if commented lines, empty lines or skipped lines from skip = ... are present, problems() is inaccuarte.

Also, problems() currently reports row and column numbers for the original file. This PR adds a new column called line that replaces row to make it more apparent that this column represents the line from the original file. Now, row is used to represent the row in the data frame

delim_input <- glue::glue("#Name:,Sharla
                          #Date:,02/01/22
                          x,y
                          1,1
                          2,2.x")

output <- vroom(I(delim_input),
  col_types = "dd", comment = "#", altrep = FALSE
)
#> Warning: One or more parsing issues, call `problems()` on your data frame
#> for details, e.g.:
#>   dat <- vroom(...)
#>   problems(dat)

output
#> # A tibble: 2 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1
#> 2     2    NA

problems(output)
#> # A tibble: 1 × 6
#>    line   row   col expected actual file                         
#>   <int> <int> <int> <chr>    <chr>  <chr>                        
#> 1     5     2     2 a double 2.x    /private/var/folders/4g/9jcx…