howardbaik commented 1 year ago

Warning: This Fix isn't Actually Working!!! ⚠️

Problem

I was getting a Forbidden HTTP 403 error when running ms_synthesize()

This is what I tried

Adding the "User-Agent" (user_agent_hdr) and "Host" (host_hdr) headers.
Pasting "Bearer" in front of token (paste("Bearer", token)), since the token that the API accepts is a Bearer token.
Changed variable name from xname to voice on Line 72

Also, I added roxygen titles to exported functions.

cc @cansavvy

cansavvy commented 1 year ago

This seems reasonable but I have even more basic questions. 1) What is this package and how does it relate to text2speech package? 2) If we are going to work on maintaining it, should we transfer ownership to the lab or make a fork or copy that we maintain?

howardbaik commented 1 year ago

This seems reasonable but I have even more basic questions. 1) What is this package and how does it relate to text2speech package? 2) If we are going to work on maintaining it, should we transfer ownership to the lab or make a fork or copy that we maintain?

@cansavvy This package is the Microsoft engine that powers text2speech. I was writing tests for text2speech and found that mscstts::ms_synthesize() had a bug, so I fixed it in this PR.

howardbaik commented 1 year ago

Update

After I came back from the weekend, the bug resurfaced, so I was stuck on fixing the same bug for the past 4 hours 😂 I tried so many things to fix ms_synthesize(), but none of them worked, even though the same command ran successfully in Mac Terminal. The problem lay with httr::POST() not working correctly (or at least in the same way as this Mac terminal curl command:

curl -X POST \
     -H "Content-Type: application/ssml+xml" \
     -H "X-Microsoft-OutputFormat: riff-24khz-16bit-mono-pcm" \
     -H "Authorization: Bearer  [TOKEN HERE]" \
     -H "User-Agent: MyTextToSpeech" \
     -H "Host: westus.tts.speech.microsoft.com" \
     -d "<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Male' name='Microsoft Server Speech Text to Speech Voice (en-US, GuyNeural)'>Microsoft Speech Service Text-to-Speech API</voice></speak>" \
     "https://westus.tts.speech.microsoft.com/cognitiveservices/v1"

I found out that httr is superseded and the package author recommends using httr2 instead. So, I studied the httr2 documentation and came up with these chained functions (that work for now):

# Create request
req <- request("https://westus.tts.speech.microsoft.com/cognitiveservices/v1")

# Specify HTTP headers
 req <- req %>% 
     req_headers(
         `Content-Type` = "application/ssml+xml",
         `X-Microsoft-OutputFormat` = "riff-24khz-16bit-mono-pcm",
         `Authorization` = paste("Bearer",  token),
         `User-Agent` = "MyTextToSpeech",
         Host = "westus.tts.speech.microsoft.com") %>% 
     req_body_raw("<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Male' name='Microsoft Server Speech Text to Speech Voice (en-US, GuyNeural)'>Microsoft Speech Service Text-to-Speech API</voice></speak>")

# Perform a request and fetch the response
resp <- req %>% req_perform()

# Extract body (Binary data)
resp$body

# Transfer binary data to WAV file
output <- tempfile(fileext = ".wav")
writeBin(resp$body, con = output)

I'll incorporate these changes into ms_synthesize() later!

muschellij2 commented 1 year ago

I don’t think it’s a great idea to mix and match httr and httr2. Overall, you should be able to replicate cURL commands using just httr. I don’t think adding the extra dependency is worth it, unless we were going to rebuild most of the functions in httr2 unless you can identify something specifically that the second version of the packaging do that the first cannot.

On Tue, May 9, 2023 at 4:32 AM Howard Baek @.***> wrote:

@.**** commented on this pull request.

In R/ms_tts_auth.R https://github.com/muschellij2/mscstts/pull/14#discussion_r1187887097:

' @rdname ms_get_tts_key

' @export

ms_have_tts_key = function(api_key = NULL) { api_key = ms_get_tts_key(api_key = api_key, error = FALSE) !is.null(api_key) }

- +#' Set API Key as a global option

Done

— Reply to this email directly, view it on GitHub https://github.com/muschellij2/mscstts/pull/14#discussion_r1187887097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIGPLQN4PFKH5BHM2MSAD3XFH6KFANCNFSM6AAAAAAXXT7NE4 . You are receiving this because you commented.Message ID: @.***>

-- Best, John

cansavvy commented 1 year ago

@howardbaek Thanks for digging into this a bit! @muschellij2 I totally agree we don't want to mix and match httr and httr2 functions. I think ideally we'd switch over everything to be httr2 compatible but right now that's probably not a priority level for our projects so it's probably best to just file this as an issue and return to it later.

I think it might make sense to discuss whether you are hoping to continue to maintain this package, @muschellij2 or whether this should be absorbed by jhudsl github organization. This is up to you. I'm not sure what your bandwidth is for this and I also don't think we need to prioritize keeping microsoft text to speech functionality at this point in time (though it would be nice to have in the future perhaps).

howardbaik commented 1 year ago

I was bored on my 10 hour trip back from a conference, so I rewrote this whole package using httr2: https://github.com/howardbaek/mscstts2

cansavvy commented 1 year ago

I was bored on my 10 hour trip back from a conference, so I rewrote this whole package using httr2: https://github.com/howardbaek/mscstts2

Awesome! That's great! Can you file a pull request back to here or somehow provide a way for me (and/or @muschellij2 if he has the bandwitdth) to review it?

jhudsl / mscstts

Fixed HTTP 403 Error #14

Warning: This Fix isn't Actually Working!!! ⚠️

Problem

This is what I tried

Update

@.**** commented on this pull request.

' @rdname ms_get_tts_key

' @export