microsoft / Cognitive-SpeakerRecognition-Windows

Windows SDK for the Microsoft Speaker Recognition API, part of Cognitive Services
https://www.microsoft.com/cognitive-services/en-us/speaker-recognition-api
Other
60 stars 62 forks source link

Sppech Identification #2

Closed ManasaGanesh closed 8 years ago

ManasaGanesh commented 8 years ago

Hi,

I am using Microsoft Cognitive Services speech API for my research . As I started using speech Identification API I see one problem with female voice identification . The system is falsely identifying speaker . It is accepting the different female speaker. Steps I followed 1)Enroll female speaker1 2)Try to identify speaker 2 against speaker1 3)The system identified speaker 2 same as speaker 1 I don't see this problem with male speaker. I am not sure if I need to set any parameter for female speaker or is it a flaw. Can anyone guide me on this.

momohs commented 8 years ago

Hello @ManasaGanesh, there are no gender specific parameters for the speaker recognition. Can you please share with me the test data that you are using so that I can look further into this?

ManasaGanesh commented 8 years ago

Hi Team, I am attaching the data set I used in testing . I used 0.wav to enroll speaker (Female 1) 2,3 and 4.wav for identifying (Female 2).

The system identified 2,3 and 4.wav voice as Female 1 which is incorrect.

Your help with this case is greatly appreciated.

Thanks, Manasa

New folder.zip

momohs commented 8 years ago

Thanks for sharing this @ManasaGanesh! I will check this with the team and come back to you.

The files sent are 0.wav and 2.wav only. It would be helpful if you can tell me what are the recording conditions for those files?

ManasaGanesh commented 8 years ago

All I did was use the data set from http://festvox.org/cmu_arctic/ (US English slt (female) (0.95)http://festvox.org/cmu_arctic/dbs_slt.html US English clb (female) (0.95)http://festvox.org/cmu_arctic/dbs_clb.html )for the two female speakers. I then merged the dataset to have 60 second samples and set the sample rate and frequency as per the requirement of the system. Since they sound similar the system falsely identifies the female speaker. I tried to add more samples, due to limitations in email size I am unable to attach more samples. I am happy to provide more samples. Thanks, Manasa

From: Mohamed Mohsen [mailto:notifications@github.com] Sent: Sunday, July 24, 2016 11:16 AM To: Microsoft/Cognitive-SpeakerRecognition-Windows Cc: Chitrashekar, Manasa N; Mention Subject: Re: [Microsoft/Cognitive-SpeakerRecognition-Windows] Sppech Identification (#2)

Thanks for sharing this @ManasaGaneshhttps://github.com/ManasaGanesh! I will check this with the team and come back to you.

The files sent are 0.wav and 2.wav only. It would be helpful if you can tell me what are the recording conditions for those files?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Microsoft/Cognitive-SpeakerRecognition-Windows/issues/2#issuecomment-234786032, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATRuSPc6oQEaIobWZFxOQAeVtDVorwwpks5qY4-igaJpZM4JRcn5.

momohs commented 8 years ago

Thanks for your feedback @ManasaGanesh! Looking at the shared audio files, I would say that the gender of the speaker is not the cause of this error, but rather something else that we need to investigate e.g. the recording channel.

May I highlight that our system is trained using state of the art technology on balanced data that has speakers from both genders. You can read more about our system in this blog post.

ManasaGanesh commented 8 years ago

Thank you so much for your inputs. This tool is really helping me a lot in my research. I would test more female samples and see how it behaves. As you told in your earlier email , may I know what is meant my balanced data. I can work on changing my dataset to respect that balanced form. Your help is greatly appreciated.

Thanks, Manasa

From: Mohamed Mohsen [mailto:notifications@github.com] Sent: Tuesday, July 26, 2016 3:41 AM To: Microsoft/Cognitive-SpeakerRecognition-Windows Cc: Chitrashekar, Manasa N; Mention Subject: Re: [Microsoft/Cognitive-SpeakerRecognition-Windows] Sppech Identification (#2)

Thanks for your feedback @ManasaGaneshhttps://github.com/ManasaGanesh! Looking at the shared audio files, I would say that the gender of the speaker is not the cause of this error, but rather something else that we need to investigate e.g. the recording channel.

May I highlight that our system is trained using state of the art technology on balanced data that has speakers from both genders. You can read more about our system in this blog posthttps://blogs.technet.microsoft.com/machinelearning/2015/12/14/now-available-speaker-video-apis-from-microsoft-project-oxford/.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Microsoft/Cognitive-SpeakerRecognition-Windows/issues/2#issuecomment-235199563, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATRuSNrP0mOXxW6Q5OUwlxcfGojGX6sQks5qZcgZgaJpZM4JRcn5.

momohs commented 8 years ago

I just meant that the training data we used contains both males and females. Please let me know if you have any more comments about the service.

ManasaGanesh commented 8 years ago

I tested both male and female voice . System behaved well for male voice, 1 case failed out of 20 samples . The probability that female voices failed was 9/10 cases. I am just concerned about this behavior. According to your inputs this need more investigation. If that can be fixed this service will be best . I don’t have any other comments or issues with this tool.

Thanks, Manasa

From: Mohamed Mohsen [mailto:notifications@github.com] Sent: Thursday, July 28, 2016 10:35 AM To: Microsoft/Cognitive-SpeakerRecognition-Windows Cc: Chitrashekar, Manasa N; Mention Subject: Re: [Microsoft/Cognitive-SpeakerRecognition-Windows] Sppech Identification (#2)

I just meant that the training data we used contains both males and females. Please let me know if you have any more comments about the service.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Microsoft/Cognitive-SpeakerRecognition-Windows/issues/2#issuecomment-235932768, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATRuSEdIQfNAKx71OCBHPBEBq-b__PVtks5qaMwegaJpZM4JRcn5.

momohs commented 8 years ago

All the female test cases were from the same corpus you shared earlier?

ManasaGanesh commented 8 years ago

Yes also I tested non English female speakers from different dataset. I can share that if required.

Thanks, Manasa

From: Mohamed Mohsen [mailto:notifications@github.com] Sent: Thursday, July 28, 2016 10:59 AM To: Microsoft/Cognitive-SpeakerRecognition-Windows Cc: Chitrashekar, Manasa N; Mention Subject: Re: [Microsoft/Cognitive-SpeakerRecognition-Windows] Sppech Identification (#2)

All the female test cases were from the same corpus you shared earlier?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Microsoft/Cognitive-SpeakerRecognition-Windows/issues/2#issuecomment-235940241, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATRuSEZCB6045CweUJPPls05b75543oSks5qaNGtgaJpZM4JRcn5.

momohs commented 8 years ago

Please share with me the english part. At this moment, the service only supports "en-us" locale as mentioned in the documentation.

ManasaGanesh commented 8 years ago

I can share the English dataset . I have tested for same Arabic samples. It behaved well for male speakers. I was thinking that, identification depends on voice print of speaker irrespective of language.

From: Mohamed Mohsen [mailto:notifications@github.com] Sent: Thursday, July 28, 2016 11:38 AM To: Microsoft/Cognitive-SpeakerRecognition-Windows Cc: Chitrashekar, Manasa N; Mention Subject: Re: [Microsoft/Cognitive-SpeakerRecognition-Windows] Sppech Identification (#2)

Please share with me the english part. At this moment, the service only supports "en-us" locale as mentioned in the documentationhttps://dev.projectoxford.ai/docs/services/563309b6778daf02acc0a508/operations/5645c068e597ed22ec38f42e.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Microsoft/Cognitive-SpeakerRecognition-Windows/issues/2#issuecomment-235951759, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATRuSIsUrufZd65_g0wzlk6OJrd5f80Kks5qaNr1gaJpZM4JRcn5.

momohs commented 8 years ago

True that the problem might seem to be language independent, however, the phonetics of each language is different and that plays an important role in the performance of the system. At this moment, we only support english-US.

If I was in your place, I would try some more standard datasets e.g. TIMIT.

ManasaGanesh commented 8 years ago

Hey , Thanks for your input. But I see that TIMIT is not free . Could you suggest me any open source which are of good standard .

Regards, Manasa

From: Mohamed Mohsen [mailto:notifications@github.com] Sent: Monday, August 01, 2016 4:35 AM To: Microsoft/Cognitive-SpeakerRecognition-Windows Cc: Chitrashekar, Manasa N; Mention Subject: Re: [Microsoft/Cognitive-SpeakerRecognition-Windows] Sppech Identification (#2)

True that the problem might seem to be language independent, however, the phonetics of each language is different and that plays an important role in the performance of the system. At this moment, we only support english-US.

If I was in your place, I would try some more standard datasets e.g. TIMIT.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Microsoft/Cognitive-SpeakerRecognition-Windows/issues/2#issuecomment-236534113, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATRuSInq618Ga6jqZK_Lp8Fz62WylVr1ks5qbb3KgaJpZM4JRcn5.

momohs commented 8 years ago

I'm not sure which datasets are for free and which are not. Please let me know if you have any further questions related to our service or SDK.

momohs commented 8 years ago

I'll close this for now. Please feel free to re-open if you have any further questions.