mnpopcenter / ipumsr

Read IPUMS data into R.
http://tech.popdata.org/ipumsr
Mozilla Public License 2.0
89 stars 8 forks source link

optional parameter in read_ipums_micro to choose how haven-labelled variables are handled #68

Closed schmert closed 2 years ago

schmert commented 3 years ago

Haven-labelled variables are unfamiliar to many R users. The {ipumsr} documentation even includes instructions that suggest that R users will almost always want to alter the haven-labelled variables output by read_ipums_micro before doing any real work -- with zap_values, to_character, etc.

Would it be possible to add a parameter to read_ipums_micro that allows the user to choose how labelled variables are output in the first place? For example, output_labelled_as = c("haven", "value", "label", "factor") with the default being the current "haven"?

This could save R users a ton of headaches. Thanks.

dtburk commented 3 years ago

That's an interesting idea, thanks!

One immediate question I have is how to best handle variables for which not all the values are labelled, and the user asks read_ipums_micro() to output_labelled_as "label" or "factor".

For example, say you have an income variable where the only labelled value is 9999999, which indicates a "Missing" value. If the user requests output_labelled_as "label" or "factor", I wouldn't really want to coerce all those unlabelled numeric values to characters or factor levels.

Perhaps the best way to handle that situation would be to skip variables for which only a subset of values are labelled, with a warning, and leave them as haven-labelled vectors. Does that seem sensible?

schmert commented 3 years ago

Partially-labeled variables like top- or bottom-coded numerical ranges are a problem that I hadn't considered. That would get complicated to explain, I suppose. Another approach might be to add parameters that explicitly list which haven-labelled variables to convert.

Like

haven_to_factor = c('SAMPLE', 'COUNTRY') haven_to_label = c('SERIES')

or something?

dtburk commented 3 years ago

Sorry for the long silence. I still think this is an interesting idea, and we will consider including it in our next release.

schmert commented 3 years ago

Thanks, Derek.

Carl

From: Derek Burk @.> Sent: Tuesday, June 8, 2021 17:37 To: mnpopcenter/ipumsr @.> Cc: Carl Schmertmann @.>; Author @.> Subject: Re: [mnpopcenter/ipumsr] optional parameter in read_ipums_micro to choose how haven-labelled variables are handled (#68)

Sorry for the long silence. I still think this is an interesting idea, and we will consider including it in our next release.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/mnpopcenter/ipumsr/issues/68*issuecomment-857177929__;Iw!!PhOWcWs!iPOk7OGYEN4pexqmmW4QCK9y24JkrmqbY4RW-ZUYYyiOd5Rn5s8PTlQkDSX15_rnrA$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/ABP65HOFT6NDA5Q6GRM3IODTR2EPRANCNFSM43GCV5XA__;!!PhOWcWs!iPOk7OGYEN4pexqmmW4QCK9y24JkrmqbY4RW-ZUYYyiOd5Rn5s8PTlQkDSUU122_QA$.

dtburk commented 2 years ago

Closed here because moved to ipums/ipumsr#11