Open nuno-agostinho opened 4 years ago
I have the same issue, picked up with ordinal indicators. It looks like this is a problem with the hunspell parser:
hunspell::hunspell_parse(c("1st", "RNA-seq", "EIF4G1"))
#> [[1]]
#> [1] "st"
#>
#> [[2]]
#> [1] "RNA" "seq"
#>
#> [[3]]
#> [1] "EIF" "G"
Created on 2021-02-06 by the reprex package (v0.3.0)
Implementing a pre filter right before the parse here could work:
It feels like more of a quick-fix because it parses with strsplit()
then paste()
s back together before being sent to the actual parsing function.
ignore_words <- c("1st", "RNA-seq", "EIF4G1")
lines <- c(
"This is the 1st line. It has first written in it.",
"The second has RNA-seq inside. But does not use RNAseq -- without the '-'",
"EIF4G1 but not EIF4G1fdsadf is used",
"This line's words are fine!"
)
pre_filter_plain <- function(lines, ignore = character()) {
word_list <- strsplit(lines, "([^-[:alnum:][:punct:]])")
vapply(
word_list,
function(i) {
paste(i[!i %in% ignore], collapse = " ")
},
character(1)
)
}
pre_filter_plain(lines, ignore_words)
#> [1] "This is the line. It has first written in it."
#> [2] "The second has inside. But does not use RNAseq -- without the '-'"
#> [3] "but not EIF4G1fdsadf is used"
#> [4] "This line's words are fine!"
Created on 2021-02-06 by the reprex package (v0.3.0)
I am using the following words in my package:
After inserting these words in
inst/WORDLIST
and runningspelling::spell_check_package()
, the function reports that the wordsseq
,st
,nd
andEIF
are misspelled.Currently, my
WORDLIST
includes the wordsseq
,st
,nd
andEIF
to avoid triggering the spell checker, but I would prefer to include the full words. Thanks.