lsms-worldbank / labeller

Spot and fix common problems with variable and value labels🏷
https://lsms-worldbank.github.io/labeller/
0 stars 0 forks source link

Create commands to check variable label length #15

Closed arthur-shaw closed 10 months ago

arthur-shaw commented 10 months ago

@kbjarkefur , can you think of shorter function names that are still indicative of function?

arthur-shaw commented 10 months ago

Closes #8

kbjarkefur commented 10 months ago

I made some fixes. Mostly stylish and other edits that will only have an effect in rare cases. See the commit description for details.

For command names, how about lbl_list_long_vars and lbl_assert_no_long_vars. Not much shorter, and perhaps a bit less precise, but feels like there is less risk you have to read the names twice to get what they do.

kbjarkefur commented 10 months ago

Next I have a question about whether these commands should have the feature to accept a varlist. I can see argument for both sides, and I think there is no objective correct decision. It comes down to what we want this command to be.

Do we see these as commands as a tool that users should be able to use however flexible they want? Or should these commands be an opinionated tool that certifies something about the full data set? For example, if a user passes a varlist to lbl_assert_no_var_max_len then that command does no longer guarantee that there are no long var labels in the dataset. But if we do not allow a varlist we might be restricting the full potential utility.

I implemented lbl_list_pipes without varlist, but my only strong preference is that we are consistent accross the package. There are some fixes can do regardless of what we decide we want.

arthur-shaw commented 10 months ago

Command names

On the command names, inspired by your suggestions, I have a few others.

To avoid repetition of max_len (echoed in the maxlen option), here are some equally long names:

To prioritize brevity (at the expense of repetition):

And to add two more characters (at the expense, somewhat of brevit, and precision):

Allow varlist or not

I'm inclined to offer users the ability to provide a variable list. I expect that the option will be rarely used, but might be useful for those that would like to use it.

I do, however, appreciate your point that if a varlist is possible that the assertion commands lose their utility as a "your data are OK" seal of approval.

Perhaps we could address this concern by providing more details in the message printed to the console. In particular, if any varlist is provided, the command should indicate the scope for which the assertion is valid (i.e., the varlist).

What do you think--on both points?

kbjarkefur commented 10 months ago

Renamed to:

And moved varlist to https://github.com/lsms-worldbank/labeller/issues/18