tidyverse / vroom

Fast reading of delimited files
https://vroom.r-lib.org
Other
621 stars 60 forks source link

Guess big integers? #500

Open rjake opened 1 year ago

rjake commented 1 year ago

Our data warehouse started using integer64 values for our keys and without specifying, these come through as col_double(). We want to reduce errors when saving and reading in the data and I wanted to know if vroom could guess at the col_big_integer() column type for us? I'm afraid folks on the team won't remember to specify it and they will get duplicate values in their analyses. For example, dplyr::n_distinct(visit_key) would show 1 unique value.

x <- 
  I(
    "visit_key, name
    100000000000000100, A
    100000000000000101, B"
    #              ---
  )

vroom::vroom(x) |>
  dplyr::pull(visit_key)
#> 100000000000000096
#> 100000000000000096
#>                ---

vroom::vroom(
  x, 
  col_types = vroom::cols("visit_key" = vroom::col_big_integer())
) |> 
  dplyr::pull(visit_key)
#> 100000000000000100 
#> 100000000000000101
#>                ---