Open howardbaik opened 1 year ago
This seems reasonable but I have even more basic questions. 1) What is this package and how does it relate to text2speech package? 2) If we are going to work on maintaining it, should we transfer ownership to the lab or make a fork or copy that we maintain?
This seems reasonable but I have even more basic questions. 1) What is this package and how does it relate to text2speech package? 2) If we are going to work on maintaining it, should we transfer ownership to the lab or make a fork or copy that we maintain?
@cansavvy This package is the Microsoft engine that powers text2speech. I was writing tests for text2speech and found that mscstts::ms_synthesize()
had a bug, so I fixed it in this PR.
After I came back from the weekend, the bug resurfaced, so I was stuck on fixing the same bug for the past 4 hours 😂
I tried so many things to fix ms_synthesize()
, but none of them worked, even though the same command ran successfully in Mac Terminal. The problem lay with httr::POST()
not working correctly (or at least in the same way as this Mac terminal curl command:
curl -X POST \
-H "Content-Type: application/ssml+xml" \
-H "X-Microsoft-OutputFormat: riff-24khz-16bit-mono-pcm" \
-H "Authorization: Bearer [TOKEN HERE]" \
-H "User-Agent: MyTextToSpeech" \
-H "Host: westus.tts.speech.microsoft.com" \
-d "<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Male' name='Microsoft Server Speech Text to Speech Voice (en-US, GuyNeural)'>Microsoft Speech Service Text-to-Speech API</voice></speak>" \
"https://westus.tts.speech.microsoft.com/cognitiveservices/v1"
I found out that httr
is superseded and the package author recommends using httr2
instead. So, I studied the httr2
documentation and came up with these chained functions (that work for now):
# Create request
req <- request("https://westus.tts.speech.microsoft.com/cognitiveservices/v1")
# Specify HTTP headers
req <- req %>%
req_headers(
`Content-Type` = "application/ssml+xml",
`X-Microsoft-OutputFormat` = "riff-24khz-16bit-mono-pcm",
`Authorization` = paste("Bearer", token),
`User-Agent` = "MyTextToSpeech",
Host = "westus.tts.speech.microsoft.com") %>%
req_body_raw("<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Male' name='Microsoft Server Speech Text to Speech Voice (en-US, GuyNeural)'>Microsoft Speech Service Text-to-Speech API</voice></speak>")
# Perform a request and fetch the response
resp <- req %>% req_perform()
# Extract body (Binary data)
resp$body
# Transfer binary data to WAV file
output <- tempfile(fileext = ".wav")
writeBin(resp$body, con = output)
I'll incorporate these changes into ms_synthesize()
later!
I don’t think it’s a great idea to mix and match httr and httr2. Overall, you should be able to replicate cURL commands using just httr. I don’t think adding the extra dependency is worth it, unless we were going to rebuild most of the functions in httr2 unless you can identify something specifically that the second version of the packaging do that the first cannot.
On Tue, May 9, 2023 at 4:32 AM Howard Baek @.***> wrote:
@.**** commented on this pull request.
In R/ms_tts_auth.R https://github.com/muschellij2/mscstts/pull/14#discussion_r1187887097:
' @rdname ms_get_tts_key
' @export
ms_have_tts_key = function(api_key = NULL) { api_key = ms_get_tts_key(api_key = api_key, error = FALSE) !is.null(api_key) }
- +#' Set API Key as a global option
Done
— Reply to this email directly, view it on GitHub https://github.com/muschellij2/mscstts/pull/14#discussion_r1187887097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIGPLQN4PFKH5BHM2MSAD3XFH6KFANCNFSM6AAAAAAXXT7NE4 . You are receiving this because you commented.Message ID: @.***>
-- Best, John
@howardbaek Thanks for digging into this a bit! @muschellij2 I totally agree we don't want to mix and match httr and httr2 functions. I think ideally we'd switch over everything to be httr2
compatible but right now that's probably not a priority level for our projects so it's probably best to just file this as an issue and return to it later.
I think it might make sense to discuss whether you are hoping to continue to maintain this package, @muschellij2 or whether this should be absorbed by jhudsl
github organization. This is up to you. I'm not sure what your bandwidth is for this and I also don't think we need to prioritize keeping microsoft text to speech functionality at this point in time (though it would be nice to have in the future perhaps).
I was bored on my 10 hour trip back from a conference, so I rewrote this whole package using httr2: https://github.com/howardbaek/mscstts2
I was bored on my 10 hour trip back from a conference, so I rewrote this whole package using httr2: https://github.com/howardbaek/mscstts2
Awesome! That's great! Can you file a pull request back to here or somehow provide a way for me (and/or @muschellij2 if he has the bandwitdth) to review it?
Warning: This Fix isn't Actually Working!!! ⚠️
Problem
I was getting a Forbidden HTTP 403 error when running
ms_synthesize()
This is what I tried
user_agent_hdr
) and "Host" (host_hdr
) headers.paste("Bearer", token)
), since the token that the API accepts is a Bearer token.xname
tovoice
on Line 72Also, I added roxygen titles to exported functions.
cc @cansavvy