tidyverse / vroom

Fast reading of delimited files
https://vroom.r-lib.org
Other
622 stars 60 forks source link

Feature request: Column-level na values via collectors #532

Open khusmann opened 5 months ago

khusmann commented 5 months ago

Right now, NA values are specified globally via na arg in the read_*() family of functions. Sometimes I want to supply NA values for specific columns, rather than the entire data set. A nice way to do this could be to add an na arg to all of the collector types to specify column-level missing values.

Column-level missing values come up frequently in survey data. Here are two examples:

Example 1:

What is your current stress level? a. Low (LOW) b. Moderate (MODERATE) c. High (HIGH) d. I don’t know (DONT_KNOW) e. I don’t understand the question (DONT_UNDERSTAND)

I'd like to be able to create a col_factor type that reads the last two responses as NA as follows:

col_factor(levels = c("LOW", "MODERATE", "HIGH"), ordered = TRUE, na = c("DONT_KNOW", "DONT_UNDERSTAND"))

Example 2:

An item that records the individual's height as a double, but can have the following missing values: "ABSENT", "RULER_BROKE"

col_double(na = c("ABSENT", "RULER_BROKE"))