statsmaths / cleanNLP

R package providing annotators and a normalized data model for natural language processing
GNU Lesser General Public License v2.1
209 stars 36 forks source link

Test failures after updating stringi to v1.8.1 #83

Closed gagolews closed 8 months ago

gagolews commented 8 months ago

Hi! I see the following errors when performing reverse dependency checks after an update to stringi to v1.8.1

Package: cleanNLP
Check: tests
New result: ERROR
    Running ‘testthat.R’ [1s/1s]
  Running the tests in ‘tests/testthat.R’ failed.
  Complete output:
    > library(testthat)
    > library(cleanNLP)
    > 
    > test_check("cleanNLP")
    [ FAIL 3 | WARN 0 | SKIP 3 | PASS 0 ]

    ══ Skipped tests (3) ═══════════════════════════════════════════════════════════
    • On CRAN (3): 'test-annotation.R:24:3', 'test-annotation.R:33:3',
      'test-annotation.R:42:3'

    ══ Failed tests ════════════════════════════════════════════════════════════════
    ── Error ('test-annotation.R:9:3'): testing stringi ────────────────────────────
    Error in `(function (case_insensitive, comments, dotall, dot_all = dotall, 
        literal, multiline, multi_line = multiline, unix_lines, uword, 
        error_on_unknown_escapes, time_limit = 0L, stack_limit = 0L) 
    {
        opts <- list()
        if (!missing(case_insensitive)) 
            opts["case_insensitive"] <- case_insensitive
        if (!missing(comments)) 
            opts["comments"] <- comments
        if (!missing(literal)) 
            opts["literal"] <- literal
        if (!missing(unix_lines)) 
            opts["unix_lines"] <- unix_lines
        if (!missing(uword)) 
            opts["uword"] <- uword
        if (!missing(error_on_unknown_escapes)) 
            opts["error_on_unknown_escapes"] <- error_on_unknown_escapes
        if (!missing(stack_limit)) 
            opts["stack_limit"] <- stack_limit
        if (!missing(time_limit)) 
            opts["time_limit"] <- time_limit
        if (!missing(dotall)) 
            opts["dotall"] <- dotall
        else if (!missing(dot_all)) 
            opts["dotall"] <- dot_all
        if (!missing(multiline)) 
            opts["multiline"] <- multiline
        else if (!missing(multi_line)) 
            opts["multiline"] <- multi_line
        opts
    })(locale = "en_US_POSIX")`: unused argument (locale = "en_US_POSIX")
    Backtrace:
        ▆
     1. └─cleanNLP::cnlp_annotate(un, verbose = FALSE) at test-annotation.R:9:3
     2.   └─cleanNLP:::annotate_with_stringi(input, verbose)
     3.     └─base::lapply(...)
     4.       └─cleanNLP (local) FUN(X[[i]], ...)
     5.         └─stringi::stri_detect(v, regex = "\\A[\\h\\n\\t\\f]+\\Z", locale = volatiles$stringi$locale)
     6.           └─stringi::stri_detect_regex(str, regex, ...)
     7.             └─base::do.call(stri_opts_regex, as.list(c(opts_regex, ...)))
    ── Error ('test-tools.R:9:3'): testing utils_tfidf ─────────────────────────────
    Error in `(function (case_insensitive, comments, dotall, dot_all = dotall, 
        literal, multiline, multi_line = multiline, unix_lines, uword, 
        error_on_unknown_escapes, time_limit = 0L, stack_limit = 0L) 
    {
        opts <- list()
        if (!missing(case_insensitive)) 
            opts["case_insensitive"] <- case_insensitive
        if (!missing(comments)) 
            opts["comments"] <- comments
        if (!missing(literal)) 
            opts["literal"] <- literal
        if (!missing(unix_lines)) 
            opts["unix_lines"] <- unix_lines
        if (!missing(uword)) 
            opts["uword"] <- uword
        if (!missing(error_on_unknown_escapes)) 
            opts["error_on_unknown_escapes"] <- error_on_unknown_escapes
        if (!missing(stack_limit)) 
            opts["stack_limit"] <- stack_limit
        if (!missing(time_limit)) 
            opts["time_limit"] <- time_limit
        if (!missing(dotall)) 
            opts["dotall"] <- dotall
        else if (!missing(dot_all)) 
            opts["dotall"] <- dot_all
        if (!missing(multiline)) 
            opts["multiline"] <- multiline
        else if (!missing(multi_line)) 
            opts["multiline"] <- multi_line
        opts
    })(locale = "en_US_POSIX")`: unused argument (locale = "en_US_POSIX")
    Backtrace:
        ▆
     1. └─cleanNLP::cnlp_annotate(un, verbose = FALSE) at test-tools.R:9:3
     2.   └─cleanNLP:::annotate_with_stringi(input, verbose)
     3.     └─base::lapply(...)
     4.       └─cleanNLP (local) FUN(X[[i]], ...)
     5.         └─stringi::stri_detect(v, regex = "\\A[\\h\\n\\t\\f]+\\Z", locale = volatiles$stringi$locale)
     6.           └─stringi::stri_detect_regex(str, regex, ...)
     7.             └─base::do.call(stri_opts_regex, as.list(c(opts_regex, ...)))
    ── Error ('test-tools.R:20:3'): testing tidy_pca ───────────────────────────────
    Error in `(function (case_insensitive, comments, dotall, dot_all = dotall, 
        literal, multiline, multi_line = multiline, unix_lines, uword, 
        error_on_unknown_escapes, time_limit = 0L, stack_limit = 0L) 
    {
        opts <- list()
        if (!missing(case_insensitive)) 
            opts["case_insensitive"] <- case_insensitive
        if (!missing(comments)) 
            opts["comments"] <- comments
        if (!missing(literal)) 
            opts["literal"] <- literal
        if (!missing(unix_lines)) 
            opts["unix_lines"] <- unix_lines
        if (!missing(uword)) 
            opts["uword"] <- uword
        if (!missing(error_on_unknown_escapes)) 
            opts["error_on_unknown_escapes"] <- error_on_unknown_escapes
        if (!missing(stack_limit)) 
            opts["stack_limit"] <- stack_limit
        if (!missing(time_limit)) 
            opts["time_limit"] <- time_limit
        if (!missing(dotall)) 
            opts["dotall"] <- dotall
        else if (!missing(dot_all)) 
            opts["dotall"] <- dot_all
        if (!missing(multiline)) 
            opts["multiline"] <- multiline
        else if (!missing(multi_line)) 
            opts["multiline"] <- multi_line
        opts
    })(locale = "en_US_POSIX")`: unused argument (locale = "en_US_POSIX")
    Backtrace:
        ▆
     1. └─cleanNLP::cnlp_annotate(un, verbose = FALSE) at test-tools.R:20:3
     2.   └─cleanNLP:::annotate_with_stringi(input, verbose)
     3.     └─base::lapply(...)
     4.       └─cleanNLP (local) FUN(X[[i]], ...)
     5.         └─stringi::stri_detect(v, regex = "\\A[\\h\\n\\t\\f]+\\Z", locale = volatiles$stringi$locale)
     6.           └─stringi::stri_detect_regex(str, regex, ...)
     7.             └─base::do.call(stri_opts_regex, as.list(c(opts_regex, ...)))

stri_opts_regex never had a locale argument, and in the most recent version of stringi, instead of a warning, a warning is generated.

Could you fix that on CRAN? Thanks.

statsmaths commented 8 months ago

Yes, of course. Thanks for letting me know. I had been using it for stringi::stri_extract_all_boundaries and just copied it as an option to stringi::stri_detect. Am I correct that it can be used as an option to extract the boundaries, or it should be dropped on both functions?

gagolews commented 8 months ago

Thanks and sorry for the trouble.

*_boundaries, *_words, and *_coll -based search functions support the locale argument.

*_regex and *_fixed do not.

statsmaths commented 8 months ago

This should be fixed in the lastest commit. CRAN is down for me right now, but I will submit later today. Thanks again for your all and all your work on stringi. Truly one of my favorite and most-used R packages.

statsmaths commented 8 months ago

On CRAN and working now. Thanks again.