tidyverse / vroom

Fast reading of delimited files
https://vroom.r-lib.org
Other
621 stars 60 forks source link

Edition 2 reading missing factor rows as empty strings #487

Closed Raesu closed 1 year ago

Raesu commented 1 year ago

I am trying to migrate to edition 2, but there's an issue when reading in factors using the col_types='f' argument.

With edition 1, missing values in factor columns were read in as NA as expected. In edition 2, they are read in as empty strings, "". Reprex below. I have not seen any documentation of this change and it has made my downstream code totally fail. Adding a mutate to add missing values does not seem like the right solution.

# Expected output
with_edition(1,
read_csv('alpha,bravo,charlie
         agree,,satisfied
         neutral,,dissatisfied', col_types='fff'))

# Edition 2 output
read_csv('alpha,bravo,charlie
         agree,,satisfied
         neutral,,dissatisfied', col_types='fff')
jennybc commented 1 year ago

This will need to be thought about / addressed in vroom, which powers the second edition. Transferring.

Raesu commented 1 year ago

I actually tried the above code on my Mac and it is working as expected (edition 1 and 2 outputs are the same).

I previously was running this on a Windows 10 machine.

hadley commented 1 year ago

This looks ok to me too:

library(readr)

# Expected output
with_edition(1,
read_csv('alpha,bravo,charlie
         agree,,satisfied
         neutral,,dissatisfied', col_types='fff'))
#> # A tibble: 2 × 3
#>   alpha   bravo charlie     
#>   <fct>   <fct> <fct>       
#> 1 agree   <NA>  satisfied   
#> 2 neutral <NA>  dissatisfied

# Edition 2 output
read_csv('alpha,bravo,charlie
         agree,,satisfied
         neutral,,dissatisfied', col_types='fff')
#> # A tibble: 2 × 3
#>   alpha   bravo charlie     
#>   <fct>   <fct> <fct>       
#> 1 agree   <NA>  satisfied   
#> 2 neutral <NA>  dissatisfied

Created on 2023-08-01 with reprex v2.0.2