ropensci / spelling

Tools for Spell Checking in R
https://docs.ropensci.org/spelling
Other
107 stars 25 forks source link

Include README.md/Rmd in spell_check_package() #11

Closed noamross closed 5 years ago

noamross commented 6 years ago

I suggest that one include the README in spell_check_package(). Another option would be to find all Rmd or md files in the package, including ones outside vignettes/, such as ones that may be in inst.

dpprdan commented 6 years ago

+1 on the second option, e.g. NEWS.md (also NEWS without .md).

kbenoit commented 5 years ago

This would be a welcome addition! I tried setting this up myself by adding to spelling.R this way:

if (requireNamespace("spelling", quietly = TRUE)) {
    error <- TRUE
    additional_files <- c("index.Rmd", "README.Rmd", "installation.Rmd")

    # regular package check
    spelling::spell_check_test(vignettes = TRUE, error = error,
                               skip_on_cran = TRUE)

    # check additional files
    results <- spelling::spell_check_files(
        path = additional_files,
        ignore = spelling::get_wordlist(),
        lang = "en_GB"
    )
    if (nrow(results)) {
        if (error) {
            output <- sprintf("Potential spelling errors: %s\n", paste(results$word, collapse = ", "))
            stop(output, call. = FALSE)
        } else {
            cat("Potential spelling errors:\n")
            print(results)
        }
    }
}

but in the CHECK it generates this error:

── Test failures ───────────────────────────────────────────────── spelling ────

> if (requireNamespace("spelling", quietly = TRUE)) {
+     error <- TRUE
+     additional_files <- c("index.Rmd", "README.Rmd", "installation.Rmd")
+ 
+     # regular package check
+     spelling::spell_check_test(vignettes = TRUE, error = error,
+                                skip_on_cran = TRUE)
+ 
+     # check additional files
+     results <- spelling::spell_check_files(
+         path = additional_files,
+         ignore = spelling::get_wordlist(),
+         lang = "en_GB"
+     )
+     if (nrow(results)) {
+         if (error) {
+             output <- sprintf("Potential spelling errors: %s\n", paste(results$word, collapse = ", "))
+             stop(output, call. = FALSE)
+         } else {
+             cat("Potential spelling errors:\n")
+             print(results)
+         }
+     }
+ }
All Done!
Error in is.character(ignore) : 
  path[1]="./DESCRIPTION": No such file or directory
Calls: <Anonymous> ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted

1 error ✖ | 0 warnings ✔ | 0 notes ✔
Error: R CMD check found ERRORs
Execution halted

Exited with status 1.
jeroen commented 5 years ago

That makes sense. I guess we could just include ".md" and ".Rmd" in the package root?

jeroen commented 5 years ago

OK I pushed a fix, could you test this?

dpprdan commented 5 years ago

I tested it on one package and it seems to work fine in general.

Three suggestions/questions though:

  1. I think it would be better to exclude .md files from spellchecking if there are .rmd files with the same name, e.g. don't check README.md if there is a README.Rmd. Apart from having the same spelling mistakes come up twice, spell checking README.md will also check evaluated code from README.Rmd. This might include words, the package author does not have control over, e.g. results from API calls. This could significantly expand the packages' word list.

  2. Only slightly related, but because it involves the same code: I think it might make sense to include (r)md files from subfolders of vignettes, e.g. if one uses Rmd child documents in README.Rmd and/or vignettes. I think it would be safe to assume that (r)mds from vignettes subfolders are somehow vignette-related, would it not?

  3. Also beyond the scope of this, but would it be possible/make sense to include, e.g. README, CHANGELOG, and NEWS without an .md extension?

Finally, and just out of curiosity, why did you backpaddle?

kbenoit commented 5 years ago

This works great for me now, thanks @jeroen. By checking the .md files in the package root directory, as well, it makes our setup in spacyr work where we have two headers for README.Rmd and index.Rmd that import the child installation.Rmd. (See https://github.com/quanteda/spacyr/blob/master/README.Rmd) By checking the .md files it catches errors in the final document that were made in the child.

However it only finds these in README.md, not index.md. Can you add checking for index.md as well?

dpprdan commented 5 years ago

@kbenoit The check of files in the package root is limited to readme, news and changes with an (R)md extension at the moment. Jeroen restricted it to this after allowing all (R)md files initially and probably for good reason (which I don't know however, hence my question about backpaddling (link now fixed)).

If we can agree on checking for Rmds in sub-folders of vignettes (or another more appropriate place for child documents), then you could move installation.Rmd to such a directory and have it checked as well. My point being that it is better to check the source Rmd instead of the derived md files.

kbenoit commented 5 years ago

Got it. Adding installation.Rmd to vignettes/ should work, if the change is made to check Rmd instead of md files.

I was also suggesting adding index files to https://github.com/ropensci/spelling/commit/e965628641384697b2a9979fe626ec7465263f59#diff-1cd593a90750eb1784cb42ead1103f25R70.

jeroen commented 5 years ago

I initially implemented it so that it would check all rm and rmd files in the package root, but when I tested it on some random packages quite a few had cruft md files in the package root that they were probably not very interested spelling. So then I changed it to only check readme.md and news.md.

The rationale here is that CRAN will publish readme.md and news.md on the website, but not other files.

I'm still on the fence about this, not sure what would be the most user friendly default default. Perhaps we should just check everything in the root dir and force the pkg authors to keep their root clean.

kbenoit commented 5 years ago

Without weighing in on the issue of whether to check everything in package root, I would suggest adding index.md to the current list.

With weighing in though 😄 I would be happy to be forced to keep all my root .md files spell-checked.

jeroen commented 5 years ago

I'll add index.md to the list. Many packages have stuff like cran-comments.md or CODE_OF_CONDUCT.md or license.md and that kind of stuff, so I think checking all those is a bit overkill...