jhudsl / text2speech

Text to Speech
http://jhudatascience.org/text2speech/
GNU General Public License v3.0
19 stars 2 forks source link

CRAN Update #35

Closed howardbaik closed 1 year ago

howardbaik commented 1 year ago

Now that Loqui and ari are in a stable condition, we can submit the updates we made to this package to CRAN.

howardbaik commented 1 year ago

Running devtools::check(remote = TRUE, manual = TRUE) the first time results in this error message, which disappears the second time, then shows up again...

── Test failures ──────────────────────────────────────────────────────────────────────────────────────── testthat ────

> # This file is part of the standard setup for testthat.
> # It is recommended that you do not modify it.
> #
> # Where should you do additional test configuration?
> # Learn more about the roles of various files in:
> # * https://r-pkgs.org/tests.html
> # * https://testthat.r-lib.org/reference/test_package.html#special-files
> 
> library(testthat)
> library(text2speech)
> 
> test_check("text2speech")
2023-07-10 21:23:18 -- Calling text-to-speech API: Algorithmic complexity is a key consideration
                                                when designing efficient solutions for large-scale data processing
 > tts_models/en/ljspeech/tacotron2-DDC_ph is already downloaded.
 > Model's license - apache 2.0
 > Check https://choosealicense.com/licenses/apache-2.0/ for more info.
 > vocoder_models/en/ljspeech/univnet is already downloaded.
 > Model's license - apache 2.0
 > Check https://choosealicense.com/licenses/apache-2.0/ for more info.
 > Using model: Tacotron2
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:/Users/howardbaek/Library/Application Support/tts/tts_models--en--ljspeech--tacotron2-DDC_ph/scale_stats.npy
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Model's reduction rate `r` is set to: 2
 > Vocoder Model: univnet
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > pitch_fmin:1.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:/Users/howardbaek/Library/Application Support/tts/vocoder_models--en--ljspeech--univnet/scale_stats.npy
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Generator Model: univnet_generator
 > Discriminator Model: univnet_discriminator
 > Text: Algorithmic complexity is a key consideration
                                                when designing efficient solutions for large-scale data processing
 > Text splitted to sentences.
['Algorithmic complexity is a key consideration', 'when designing efficient solutions for large-scale data processing']
wɛn dɪzaɪnɪŋ ɪfɪʃənt səluʃənz fɚ lɑɹd͡ʒ skeɪl dætə pɹɑsɛsɪŋ
 [!] Character '͡' not found in the vocabulary. Discarding it.
 > Processing time: 1.1206820011138916
 > Real-time factor: 0.14192610575124812
 > Saving output to /private/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T//RtmpVXT96Q/working_dir/Rtmp8LyiOp/file9d8f406af12d.mp3

 *** caught segfault ***
address 0x4ef5d5000000601, cause 'invalid permissions'

Traceback:
 1: tuneR::readMP3(file)
 2: FUN(X[[i]], ...)
 3: lapply(res, tts_audio_read, output_format = audio_type)
 4: FUN(X[[i]], ...)
 5: lapply(text, function(string) {    string_processed = tts_split_text(string, limit = limit)    res = vapply(string_processed, function(tt) {        output_path = tts_temp_audio(audio_type)        tts_args <- paste0("--text", " ", shQuote(tt), " ", "--model_name",             " ", model_name, " ", "--vocoder_name", " ", vocoder_name,             " ", "--out_path /private", output_path)        res <- withr::with_path(process_coqui_path(exec_path),             system2("tts", tts_args))        output_path    }, FUN.VALUE = character(1L), USE.NAMES = FALSE)    out = lapply(res, tts_audio_read, output_format = audio_type)    df = dplyr::tibble(original_text = string, text = string_processed,         wav = out, file = normalizePath(res))})
 6: tts_coqui(text = text, exec_path = coqui_path, output_format = output_format,     bind_audio = bind_audio, ...)
 7: tts("Algorithmic complexity is a key consideration\n                                                when designing efficient solutions for large-scale data processing",     service = company)
 8: rlang::eval_tidy(code, args)
 9: eval(code, test_env)
10: eval(code, test_env)
11: withCallingHandlers({    eval(code, test_env)    if (!handled && !is.null(test)) {        skip_empty()    }}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,     message = handle_message, error = handle_error)
12: doTryCatch(return(expr), name, parentenv, handler)
13: tryCatchOne(expr, names, parentenv, handlers[[1L]])
14: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
15: doTryCatch(return(expr), name, parentenv, handler)
16: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]),     names[nh], parentenv, handlers[[nh]])
17: tryCatchList(expr, classes, parentenv, handlers)
18: tryCatch(withCallingHandlers({    eval(code, test_env)    if (!handled && !is.null(test)) {        skip_empty()    }}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,     message = handle_message, error = handle_error), error = handle_fatal,     skip = function(e) {    })
19: test_code(desc, code, env = parent.frame(), reporter = reporter)
20: testthat::test_that(completed_desc, rlang::eval_tidy(code, args))
21: withCallingHandlers(testthat::test_that(completed_desc, rlang::eval_tidy(code,     args)), testthat_braces_warning = function(cnd) {    rlang::cnd_muffle(cnd)}, warning = function(cnd) {    if (cnd$message == paste("The `code` argument to `test_that()` must be a braced expression",         "to get accurate file-line information for failures.")) {        rlang::cnd_muffle(cnd)    }})
22: .f(tts_auth = .l[[1L]][[i]], company = .l[[2L]][[i]], .test_name = .l[[3L]][[i]],     ...)
23: .Primitive(".Call")(list(name = "pmap_impl", address = <pointer: 0x14b7f5ff0>,     dll = list(name = "purrr", path = "/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/purrr/libs/purrr.so",         dynamicLookup = FALSE, handle = <pointer: 0x92d427a0>,         info = <pointer: 0x14b7498e0>), numParameters = 8L),     <environment>, "list", FALSE, 4L, NULL, 4L, c("tts_auth",     "company", ".test_name"), 3L)
24: call_with_cleanup(pmap_impl, environment(), .type, .progress,     n, names, i, call_names, call_n)
25: withCallingHandlers(expr, error = function(cnd) {    if (i == 0L) {    }    else {        message <- c(i = "In index: {i}.")        if (!is.null(names) && !is.na(names[[i]]) && names[[i]] !=             "") {            name <- names[[i]]            message <- c(message, i = "With name: {name}.")        }        else {            name <- NULL        }        cli::cli_abort(message, location = i, name = name, parent = cnd,             call = error_call, class = "purrr_error_indexed")    }})
26: with_indexed_errors(i = i, names = names, error_call = .purrr_error_call,     call_with_cleanup(pmap_impl, environment(), .type, .progress,         n, names, i, call_names, call_n))
27: pmap_("list", .l, .f, ..., .progress = .progress)
28: purrr::pmap(all_pars, build_and_run_test, desc = desc_stub, code = captured)
29: patrick::with_parameters_test_that("test tts() on Amazon, Google, Microsoft, and Coqui TTS) ",     {        testthat::skip_on_cran()        if (company != "coqui") {            if (tts_auth) {                response_df = tts("Algorithmic complexity is a key consideration\n                                                when designing efficient solutions for large-scale data processing",                   service = company)                audio_value = response_df$wav[[1]]                audio_path = response_df$file[[1]]                testthat::expect_s3_class(response_df, "data.frame")                testthat::expect_s4_class(audio_value, "Wave")                testthat::expect_equal(file.exists(audio_path),                   TRUE)            }        }        else {            tts_auth <- try(find_coqui())            if (!inherits(tts_auth, "try-error")) {                response_df = tts("Algorithmic complexity is a key consideration\n                                                when designing efficient solutions for large-scale data processing",                   service = company)                audio_value = response_df$wav[[1]]                audio_path = response_df$file[[1]]                testthat::expect_s3_class(response_df, "data.frame")                testthat::expect_s4_class(audio_value, "Wave")                testthat::expect_equal(file.exists(audio_path),                   TRUE)            }        }    }, tts_auth = c(tts_amazon_auth(getOption("aws_access_key")),         tts_google_auth(getOption("google_api_key")), tts_microsoft_auth(key_or_json_file = getOption("ms_api_key"),             region = "westus"), TRUE), company = c("amazon",         "google", "microsoft", "coqui"))
30: eval(code, test_env)
31: eval(code, test_env)
32: withCallingHandlers({    eval(code, test_env)    if (!handled && !is.null(test)) {        skip_empty()    }}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,     message = handle_message, error = handle_error)
33: doTryCatch(return(expr), name, parentenv, handler)
34: tryCatchOne(expr, names, parentenv, handlers[[1L]])
35: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
36: doTryCatch(return(expr), name, parentenv, handler)
37: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]),     names[nh], parentenv, handlers[[nh]])
38: tryCatchList(expr, classes, parentenv, handlers)
39: tryCatch(withCallingHandlers({    eval(code, test_env)    if (!handled && !is.null(test)) {        skip_empty()    }}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,     message = handle_message, error = handle_error), error = handle_fatal,     skip = function(e) {    })
40: test_code(NULL, exprs, env)
41: source_file(path, env(env), wrap = wrap)
42: FUN(X[[i]], ...)
43: lapply(test_paths, test_one_file, env = env, wrap = wrap)
44: doTryCatch(return(expr), name, parentenv, handler)
45: tryCatchOne(expr, names, parentenv, handlers[[1L]])
46: tryCatchList(expr, classes, parentenv, handlers)
47: tryCatch(code, testthat_abort_reporter = function(cnd) {    cat(conditionMessage(cnd), "\n")    NULL})
48: with_reporter(reporters$multi, lapply(test_paths, test_one_file,     env = env, wrap = wrap))
49: test_files_serial(test_dir = test_dir, test_package = test_package,     test_paths = test_paths, load_helpers = load_helpers, reporter = reporter,     env = env, stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning,     wrap = wrap, load_package = load_package)
50: test_files(test_dir = path, test_paths = test_paths, test_package = package,     reporter = reporter, load_helpers = load_helpers, env = env,     stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning,     wrap = wrap, load_package = load_package, parallel = parallel)
51: test_dir("testthat", package = package, reporter = reporter,     ..., load_package = "installed")
52: test_check("text2speech")
An irrecoverable exception occurred. R is aborting now ...

1 error ✖ | 0 warnings ✔ | 1 note ✖
howardbaik commented 1 year ago

Looks like a segmentation fault: https://stackoverflow.com/questions/2346806/what-is-a-segmentation-fault

muschellij2 commented 1 year ago

I see a “ invalid permissions” in there. Can you try assertthat::is.readable on the resulting mp3 and make sure that gives true?

On Tue, Jul 11, 2023 at 12:32 AM Howard Baek @.***> wrote:

Looks like a segmentation fault: https://stackoverflow.com/questions/2346806/what-is-a-segmentation-fault

— Reply to this email directly, view it on GitHub https://github.com/jhudsl/text2speech/issues/35#issuecomment-1630098152, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIGPLUHVIS3MKL4KMLSIRTXPTJL7ANCNFSM6AAAAAA2FF5NQE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Best, John

howardbaik commented 1 year ago

The tricky thing is whenever this happens, it crashes RStudio and erases the temp directory, so the resulting mp3 is deleted.

howardbaik commented 1 year ago

I had a gut feeling that this is related to https://github.com/jhudsl/ari/pull/45#issuecomment-1574334395, I hard-coded the output of tts() with coqui as the engine to be a WAV file.

Tests shouldn't be a problem on CRAN since I added testthat::skip_on_cran()

muschellij2 commented 1 year ago

You can use readWave from tuneR if you want. I would not recommend keeping those tests in the suite if they crash, at all. These should be fixed, even if they fail on CRAN because that condition will fail for users.