noopkat / ms-bing-speech-service

NodeJS service wrapper for Microsoft Speech API and Custom Speech Service
MIT License
82 stars 17 forks source link

How to get the result with punctuation. #17

Closed Capotasto closed 5 years ago

Capotasto commented 6 years ago

I run the example but there is no punctuation in the result.

I tried to test demo and there is punctuation in the sentence. https://docs.microsoft.com/en-us/azure/cognitive-services/speech/getstarted/getstartedcsharpdesktop

noopkat commented 6 years ago

Hi there,

When you ran the C# desktop example, did you use the same audio file as the one provided in this repo?

Capotasto commented 6 years ago

@noopkat Hi. I have tried the same file as C# demo.

The file name is batman.wav

The result is below. As you can see, It looks like be split but the sentence is not one. There is no comma or period.

$ node examples/node/sendFile.js

service started
speech turn started true
speech start detected
{ RecognitionStatus: 'Success',
  DisplayText: 'Skills and abilities Batman has no inherent super powers he relies on his own scientific knowledge detective skills and athletic prowess in the stories Batman is regarded as one of the world\'s greatest detective if not the world\'s greatest crime solver Batman has been repeatedly described as having genius level intellect one of the greatest martial artists midi see universe and having peak human physical conditioning use traveled the world acquiring the skills needed to aid his crusade grounds crime.',
  Offset: 5700000,
  Duration: 340400000 }
{ RecognitionStatus: 'Success',
  DisplayText: 'His knowledge and expertise in almost every discipline known to man is nearly unparalleled by any other character in the universe that man\'s inexhaustible wealth allows him to access advanced technology as a proficient scientists use able to use and modify those technologies to his advantage Batman describes Superman as the most dangerous man on earth able to defeat a team of super powered extraterrestrials by himself in order to rescue his imprisoned teammates.',
  Offset: 355400000,
  Duration: 339400000 }
{ RecognitionStatus: 'Success',
  DisplayText: 'In the first storyline Superman also considers Batman to be one of the most brilliant minds on the planet Batman has the ability to function under great physical pain and withstand mind control he is a master of disguise multilingual an expert in espionage often gathering information under different identities.',
  Offset: 708800000,
  Duration: 251700000 }
{ RecognitionStatus: 'Success',
  DisplayText: 'Batman\'s karate judo and jujitsu training has made him after master of stealth and escaped allowing too much allowing him to appear and disappear at will and to break free.',
  Offset: 969700000,
  Duration: 140800000 }
speech end detected
speech turn ended
speech turn started true
speech start detected
noopkat commented 6 years ago

thanks for the additional info. I've seen this myself with my own sample files, but assumed the scant punctuation was a limitation of the service itself. Thanks for comparing this SDK with the C# desktop one.

I'm not entirely sure why there's a punctuation difference in results, so I'll continue to investigate this. Could you please attach a paste / screenshot of what the C# desktop sample results output looks like so I can compare?

noopkat commented 6 years ago

In the meantime, I tried out the official JavaScript SDK sample and uploaded the batman sample audio file. The results are identical to the output of this library:

{
   "RecognitionStatus": "Success",
   "DisplayText": "Skills and abilities Batman has no inherent super powers he relies on his own scientific knowledge detective skills and athletic prowess in the stories Batman is regarded as one of the world's greatest detective if not the world's greatest crime solver Batman has been repeatedly described as having genius level intellect one of the greatest martial artists midi see universe and having peak human physical conditioning use traveled the world acquiring the skills needed to aid his crusade grounds crime.",
   "Offset": 5700000,
   "Duration": 340400000
}
{
   "RecognitionStatus": "Success",
   "DisplayText": "His knowledge and expertise in almost every discipline known to man is nearly unparalleled by any other character in the universe that man's inexhaustible wealth allows him to access advanced technology as a proficient scientists use able to use and modify those technologies to his advantage Batman describes Superman as the most dangerous man on earth able to defeat a team of super powered extraterrestrials by himself in order to rescue his imprisoned teammates.",
   "Offset": 355400000,
   "Duration": 339400000
}
{
   "RecognitionStatus": "Success",
   "DisplayText": "In the first storyline Superman also considers Batman to be one of the most brilliant minds on the planet Batman has the ability to function under great physical pain and withstand mind control he is a master of disguise multilingual an expert in espionage often gathering information under different identities.",
   "Offset": 708800000,
   "Duration": 251700000
}
{
   "RecognitionStatus": "Success",
   "DisplayText": "Batman's karate judo and jujitsu training has made him after master of stealth and escaped allowing too much allowing him to appear and disappear at will and to break free.",
   "Offset": 969700000,
   "Duration": 140800000
}
{
   "RecognitionStatus": "EndOfDictation",
   "Offset": 1145400000,
   "Duration": 0
}

So in theory I don't think this is an issue isolated to this library. I'm going to follow up with the Speech API product team if possible and let you know if I'm able to find out more about this. There could be a number if reasons why this is happening!

Thanks again for reporting this, or I wouldn't have realised something was up 😄

bitmoji

Capotasto commented 6 years ago

@noopkat Thank you for trying official JavaScritp SDK. Yes, that one also no punctuation. I tried these Speech Service so far. Then below lib can output the result with punctuation.

So I guess that those two libs might call different Speech API instead of JS's.

I'm going to follow up with the Speech API product team if possible and let you know if I'm able to find out more about this.

Thank you so much :) but, in the meantime, I'm trying to use C# lib because I need to split the sentence for LUIS.

Capotasto commented 6 years ago

Here is C# lib result. It's I want you to see the Final n-BEST Results part. This is the result include almost perfect punctuation.

csharp_speech_to_text_result.txt

noopkat commented 6 years ago

@Capotasto thank you so much for attaching this. The Final n-BEST Results addition was the last puzzle piece I need to follow up with the product team. I'll report back what I find out!

noopkat commented 5 years ago

Hi @Capotasto,

I wasn't able to get you the answer you needed on this issue.

However, since then, there's been a new official version of a NodeJS supported SDK for the unified Microsoft Speech Services (formally Bing Speech Service). I believe it should not have the same punctuation inconsistencies now. I'd recommend checking that out instead, if you still have need for a library such as this: https://github.com/Azure-Samples/cognitive-services-speech-sdk

Therefore I'll respectfully close this issue and will be deprecating / archive this repo 🙇‍♀ Thanks again for your contribution.