Closed schmert closed 2 years ago
That's an interesting idea, thanks!
One immediate question I have is how to best handle variables for which not all the values are labelled, and the user asks read_ipums_micro()
to output_labelled_as "label" or "factor".
For example, say you have an income variable where the only labelled value is 9999999, which indicates a "Missing" value. If the user requests output_labelled_as "label" or "factor", I wouldn't really want to coerce all those unlabelled numeric values to characters or factor levels.
Perhaps the best way to handle that situation would be to skip variables for which only a subset of values are labelled, with a warning, and leave them as haven-labelled vectors. Does that seem sensible?
Partially-labeled variables like top- or bottom-coded numerical ranges are a problem that I hadn't considered. That would get complicated to explain, I suppose. Another approach might be to add parameters that explicitly list which haven-labelled variables to convert.
Like
haven_to_factor = c('SAMPLE', 'COUNTRY') haven_to_label = c('SERIES')
or something?
Sorry for the long silence. I still think this is an interesting idea, and we will consider including it in our next release.
Thanks, Derek.
Carl
From: Derek Burk @.> Sent: Tuesday, June 8, 2021 17:37 To: mnpopcenter/ipumsr @.> Cc: Carl Schmertmann @.>; Author @.> Subject: Re: [mnpopcenter/ipumsr] optional parameter in read_ipums_micro to choose how haven-labelled variables are handled (#68)
Sorry for the long silence. I still think this is an interesting idea, and we will consider including it in our next release.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/mnpopcenter/ipumsr/issues/68*issuecomment-857177929__;Iw!!PhOWcWs!iPOk7OGYEN4pexqmmW4QCK9y24JkrmqbY4RW-ZUYYyiOd5Rn5s8PTlQkDSX15_rnrA$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/ABP65HOFT6NDA5Q6GRM3IODTR2EPRANCNFSM43GCV5XA__;!!PhOWcWs!iPOk7OGYEN4pexqmmW4QCK9y24JkrmqbY4RW-ZUYYyiOd5Rn5s8PTlQkDSUU122_QA$.
Closed here because moved to ipums/ipumsr#11
Haven-labelled variables are unfamiliar to many R users. The {ipumsr} documentation even includes instructions that suggest that R users will almost always want to alter the haven-labelled variables output by read_ipums_micro before doing any real work -- with zap_values, to_character, etc.
Would it be possible to add a parameter to read_ipums_micro that allows the user to choose how labelled variables are output in the first place? For example, output_labelled_as = c("haven", "value", "label", "factor") with the default being the current "haven"?
This could save R users a ton of headaches. Thanks.