microsoft / cognitive-services-speech-sdk-js

Microsoft Azure Cognitive Services Speech SDK for JavaScript
Other
263 stars 97 forks source link

[Bug]: ErrorType (UnexpectedBreak,MissingBreak) are not receiving in detailResult words from sdk #804

Closed syama-aot closed 5 months ago

syama-aot commented 6 months ago

What happened?

I downloaded the same audio file from the example provided on the Microsoft Pronunciation Assessment Tool (https://speech.microsoft.com/portal/pronunciationassessmenttool) portal. When I attempted to retrieve the detailed results using the SDK, I noticed a discrepancy. While the public portal displayed 2 missing breaks, 1 unexpected break, and 2 mispronunciations, I was only able to receive information about the mispronunciations from the SDK. Can you explain why this difference exists?

Version

1.34.0 (Latest)

What browser/platform are you seeing the problem on?

No response

Relevant log output

No response

glharper commented 6 months ago

@syama-aot Thanks for using JS Speech SDK, and submitting this issue. After creating a PronunciationAssessmentConfig instance, you must set that instance's enableProsodyAssessment property to true, e.g.

        pronunciationAssessmentConfig.enableProsodyAssessment = true;

Hope that helps.

syama-aot commented 6 months ago

@syama-aot Thanks for using JS Speech SDK, and submitting this issue. After creating a PronunciationAssessmentConfig instance, you must set that instance's enableProsodyAssessment property to true, e.g.

        pronunciationAssessmentConfig.enableProsodyAssessment = true;

Hope that helps. @glharper , Thank you for reply. const pronunciationAssessmentConfig = sdk.PronunciationAssessmentConfig.fromJSON( '{ "GradingSystem": "FivePoint", \ "Granularity": "Word", \ "EnableMiscue": "False"}' ); pronunciationAssessmentConfig.enableProsodyAssessment = true;

pronunciationAssessmentConfig.enableContentAssessmentWithTopic( "Talk about your day today" );

// setting the recognition language to English. speechConfig.speechRecognitionLanguage = "en-US"; let startTime = 0; // Initialize start time

// create the speech recognizer. var reco = new sdk.SpeechRecognizer(speechConfig, audioConfig); pronunciationAssessmentConfig.applyTo(reco);

let totalResultText = ""; // Accumulate recognized text let pronunciationScores = []; // Store pronunciation assessment scores let recognitionComplete = false; // Flag to track recognition completion let contentAssessmentResult = null; function onRecognizedResult(result) { if (!recognitionComplete) { totalResultText += result.text; // Append recognized text const pronunciationResult = sdk.PronunciationAssessmentResult.fromResult(result); // console.log("pronounce result", pronunciationResult); // console.log("actual result", pronunciationResult); pronunciationScores.push({ text: result.text, accuracyScore: pronunciationResult.accuracyScore, pronunciationScore: pronunciationResult.pronunciationScore, completenessScore: pronunciationResult.completenessScore, fluencyScore: pronunciationResult.fluencyScore, prosodyScore: pronunciationResult.prosodyScore, result: pronunciationResult.contentAssessmentResult, }); // contentAssessmentResult = // pronunciationScores?.[pronunciationScores?.length - 1];

_.forEach(pronunciationResult.detailResult.Words, (word, idx) => {
  console.log(
    "    ",
    idx + 1,
    ": word: ",
    word.Word,
    "\taccuracy score: ",
    word.PronunciationAssessment.AccuracyScore,
    "\terror type: ",
    word.PronunciationAssessment.**ErrorType**,
    ";"
  );
});

} }

reco.recognizing = (s, e) => { // console.log(Recognizing: ${e.result.text}); onRecognizedResult(e.result); };

reco.recognized = (s, e) => { if (e.result.reason === sdk.ResultReason.RecognizedSpeech) { console.log(Recognized: ${e.result.text}); onRecognizedResult(e.result); } else if (e.result.reason === sdk.ResultReason.NoMatch) { console.log("No speech could be recognized."); } };

reco.sessionStopped = (s, e) => { console.log("Session stopped event received."); recognitionComplete = true; // Set recognition completion flag // console.log("Total recognized text:", totalResultText); // console.log("Pronunciation assessment scores:", pronunciationScores); // console.log("Content assessment Result", contentAssessmentResult);

const endTime = Date.now(); // Measure end time const totalTimeSpent = endTime - startTime; // Calculate total time spent console.log("Total time spent (ms):", totalTimeSpent); };

reco.startContinuousRecognitionAsync( () => { console.log("Continuous recognition started."); startTime = Date.now(); // Set start time when recognition starts }, (err) => { console.error("Error starting continuous recognition:", err); } ); This is the code I have. Still I am receiving only Mispronunciations. ( Missing Break, Unexpected Break still missing in API)

glharper commented 5 months ago

@syama-aot In the Speech Studio, looking at the sample code (under Developer Resources) for JavaScript, I see this comment:

        // For continuous pronunciation assessment mode, the service won't return the words with `Insertion` or `Omission`
        // We need to compare with the reference text after received all recognized words to get these error words.

The sample code then details logic to perform that comparison. Is that logic what you're asking for?

LeoLiu-Xingchi commented 5 months ago

@glharper I have a similar question with the SDK in Swift. I did enable:

pronAssessmentConfig.enableProsodyAssessment()

But the word-level assessment results never return UnexpectedBreak/MissingBreak. The only errorType I can get is "Mispronunciation".

Here is my setup:

let speechRecognizer = try! SPXSpeechRecognizer(speechConfiguration: speechConfig, language: "en-US", audioConfiguration: audioConfig)

let pronAssessmentConfig = try! SPXPronunciationAssessmentConfiguration("", gradingSystem: SPXPronunciationAssessmentGradingSystem.hundredMark, granularity: SPXPronunciationAssessmentGranularity.phoneme, enableMiscue: false)

pronAssessmentConfig.enableProsodyAssessment()
try! pronAssessmentConfig.apply(to: speechRecognizer)
alielbekov commented 3 months ago

Error types related to breaks, including UnexpectedBreak and MissingBreak. The current version doesn't provide the break error type. You need to set thresholds on the fields UnexpectedBreak – Confidence and MissingBreak – confidence to decide whether there's an unexpected break or missing break before the word.

Suggested thresholds on both confidence scores are 0.75. That means, if the value of UnexpectedBreak – Confidence is larger than 0.75, it has an unexpected break. If the value of MissingBreak – confidence is larger than 0.75, it has a missing break. While 0.75 is a value we recommend, it's better to adjust the thresholds based on your own scenario. If you want to have variable detection sensitivity on these two breaks, you can assign different thresholds to the UnexpectedBreak - Confidence and MissingBreak - Confidence fields.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-pronunciation-assessment?pivots=programming-language-javascript#unscripted-assessment-results