Closed victorhartman closed 1 year ago
Hi @VictorHartman,
Reading file metadata is a relatively tiny part of the reading process, so there wouldn't be any noticeable performance improvement from skipping the metadata, and the zap_*()
functions provide the means to remove metadata as needed. These functions only modify attributes when they're removing the metadata, so should also have a very minimal performance impact.
One exception to this is when user defined missing values are converted to NA
- since this is modifying values in the vector itself it will potentially have a noticeable impact on performance. Converting user defined missings to NA
in this way is the default behaviour of read_sav()
and zap_labels()
, but if you'd prefer to keep the original values rather than converting to NA
you can use the user_na
argument for both functions:
df <- read_sav("df.sav", user_na = TRUE)
df <- df |>
zap_label() |>
zap_labels(user_na = TRUE)
Currently I am working with many datasets for which I do not use the labels or labelled data. In the sense of improving reading performance, would it be possible to skip the labelled data on read? Also, I guess then user defined missing values will no longer be converted to NA? Would it actually improve performance?
Now I do this for each dataset (in a loop), which seems wasteful.
So something like
df <- read_sav("df.sav", skip_labels = TRUE)