Feature: Define exclusions

wjakethompson commented 2 weeks ago

It would be nice to be able to define exclusions to specific instances of lints, similar to lintr. For example, some of my documentation titles or parameter descriptions include proper nouns, or acronyms. This causes the sentence case linter to flag, because it is expecting all lower case after the first word. The result is that I end up getting a lot of false flags, which makes it hard to find the true issues I want to fix.

My particular use case is here: https://github.com/wjakethompson/measr/tree/doc-cleanup, which can reproduce the FALSE flags I'm describing.

dgkf commented 2 weeks ago

First - sorry that the package is generating noise in your use case, and thank you for reporting it.

Even beyond the ability to disable individual lints, it's probably a bit overly sensitive to assume that all characters should be lowercase. To start, I think the linter should be relaxed to only enforce that the first character is upper-case.

Then to address your suggestion, I also think there should be some mechanism of disabling individual lints for roxygen tags. I'll have to do a bit of brainstorming to come up with a syntax that is both convenient from a developer perspective, while avoiding injecting any content into the Rd files.

dgkf commented 1 week ago

Just following up to say that I have a work-in-progress kicked off in the @7-sentence-case branch.

Progress so far

This really opened a can of worms, but I think the project will be better for it. Just to briefly step through the series of events:

I felt like the most actionable way to suppress lints would be to use a no-code inline code block such as `r # nolint `. This is nice for a couple reasons:
- it doesn't actually emit anything to the documentation
- we can still pick it out from the raw input to suppress lints
However, then I realized that inline code blocks will trigger lints. For example This is Title `r print("Case")` will currently get flagged as not title case because the linter is currently very rudimentary and will think print is part of the title.
So then I went down the path of rendering the roxygen content to Rd files and trying to use the semantic meaning embedded in the Rd data to choose which elements should be lintable.

Overall, this is probably a more comprehensive approach, but it is a big change that adds a lot of moving pieces to the package which makes me a bit nervous.

`measr/R/data.R`

I've tested it against your measr package, and it seems to work. Just to show a few options that you would have to address lints:

An alternative to # nolint, this feature will also ignore specific Rd tags. Among these are \dfn and \acronym (as mentioned in the R extensions manual). Now that the Rd semantics are used to decide when to lint, they can be appropriately ignored for things like acronyms.

- #' Examination for the Certificate of Proficiency in English (ECPE)
+ #' Examination for the \dfn{Certificate of Proficiency in English}
+ #' (\acronym{ECPE})

For situations where there isn't an appropriate Rd tag, you can always use # nolint. You can either use a blanket `r # nolint` to disable everything, or specify linters to disable, matching the syntax for {lintr}.

- #' MacReady & Dayton (1977) Multiplication Data
+ #' MacReady & Dayton (1977) Multiplication Data  `r # nolint: sentence_case.`

dgkf commented 1 week ago

And a couple new ideas.

Maybe roxylint should provide a RdMacro \nolint{} that can be used to easily omit parts of your docs from being linted.
Since many of your lints are proper nouns, maybe roxylint can create an allow-list based on names discovered in the CITATION file. I really like this approach because it not only encourages clean docs, but also nudges developers toward comprehensive citations. I'm not terribly familiar with the standard practices for citations these days - if you have ideas on how this should work, I'd greatly appreciate it!
We can take a look at @concept for each block (or maybe for the whole package?) to excuse these from lints.

wjakethompson commented 1 week ago

Thanks for considering this feature! I really like the idea a RdMacro. The CITATION file might work. I agree with about encouraging citations, but I can also see use cases where not all words that need to be ignored end up in a citation file. For example, I'm currently trying to get my hands on another data set to include, which is data collected from a specific state. So I would probably say the state name in docs, but the state name isn't actually in the preferred citation for the data set.

openpharma / roxylint

Feature: Define exclusions #7

Progress so far

`measr/R/data.R`