sfirke / janitor

simple tools for data cleaning in R
http://sfirke.github.io/janitor/
Other
1.38k stars 132 forks source link

suggestion: option to silence warnings in row_to_names #452

Closed mgacc0 closed 3 years ago

mgacc0 commented 3 years ago

Could you consider adding an optional parameter in row_to_names to not warn about "Row X does not provide unique names. Consider running clean_names() after row_to_names()." Something like warnings = FALSE or quiet = TRUE.

The warning text is displayed always, even if the next line is exactly

%>%
    janitor::clean_names()

In my case, I have a script with many thousands of lines where the warnings (if any) should be thoroughly reviewed. So this one is inadecuate.

billdenney commented 3 years ago

I understand the desire, but I don't think that I agree with the method.

The specific code is here: https://github.com/sfirke/janitor/blob/6cd6b8ea1b03990402b048de336902a043692193/R/row_to_names.R#L21-L23

And, the warning is needed in most cases. If the names before cleaning are not specific, then that will cause issues if clean_names() is not called. And, since the names going into clean_names() are not specific, then the names coming out of clean_names() will have the index added (e.g. a1, a2, etc.) which can also cause issues when not handled.

It is a fair point that we could add a single argument to silence this warning, but that becomes a slippery slope that we could add an argument to silence any warning. The quiet argument is used at least once within janitor to suppress messages, but since messages are simply informational and not something that is likely to need action, I would not want to go that path as it could yield confusion (i.e. why does quiet suppress warnings sometimes and just messages others?). And, we would then need to add tests that each of those flags worked to the testing code, ... (You can see how hairy this becomes.)

My preferred method would be that we take advantage of the classed warnings and errors in rlang (https://rlang.r-lib.org/reference/abort.html#muffling-and-silencing-conditions). The hard part here is that I don't know how to actually take advantage of them. I have looked a few times on how to react differently to different class arguments given to rlang::warn() (with a similar goal to yours), and I haven't found it.

billdenney commented 3 years ago

Now moments after writing the above, I learned that suppressWarnings() in base R has a classes option that will help. So, if we classed the warning, so that you could suppress it, would that solve your probelm?

rlang::warn("show this")
#> Warning: show this
rlang::warn("hide this", class="hider")
#> Warning: hide this

suppressWarnings({
  rlang::warn("show this")
  rlang::warn("hide this", class="hider")
},
classes="hider")
#> Warning: show this

Created on 2021-08-01 by the reprex package (v2.0.0)

billdenney commented 3 years ago

(Unrelated, I found a bug in reprex::reprex() while making the above example.)

mgacc0 commented 3 years ago

Now moments after writing the above, I learned that suppressWarnings() in base R has a classes option that will help. So, if we classed the warning, so that you could suppress it, would that solve your probelm?

rlang::warn("show this")
#> Warning: show this
rlang::warn("hide this", class="hider")
#> Warning: hide this

suppressWarnings({
  rlang::warn("show this")
  rlang::warn("hide this", class="hider")
},
classes="hider")
#> Warning: show this

Created on 2021-08-01 by the reprex package (v2.0.0)

That would be a good solution (at least for me).

sfirke commented 3 years ago

Fascinating, I had no idea about this concept. So we'd give the warning a class, and then a user could call supressWarnings() in such a way that it would only apply to that class - which makes it safer, since they won't risk inadvertently suppressing a separate, potentially-important warning? Sounds good to me.

billdenney commented 3 years ago

I think that it is best-practice for all warnings to be classed, but given that will cause a large number of classes to be created, I think we should be verbose in the class names so that there is no real chance for overlap. For example, the class here would be something like:

"janitor_warn_row_to_names_not_unique"

Where it would be a concatenation of package name, the text "warn", the function name, and a warning identifier.

Unless there is an objection to this with an alternate proposal (ideally the tidyverse style guide would give an indicator, but I don't see that at https://style.tidyverse.org/error-messages.html), I'll go ahead and make the PR. (But I'm pretty swamped with work for the next week or two, so it'll be about 2 weeks before I'm likely to be able to dive in. If someone else wants to make the PR...)