microsoft / Cognitive-SpeakerRecognition-Python

Python SDK for the Microsoft Speaker Recognition API, part of Cognitive Services
https://www.microsoft.com/cognitive-services/en-us/speaker-recognition-api
Other
110 stars 62 forks source link

AN EASIER WAY TO GET STARTED #15

Open Lord-V15 opened 2 years ago

Lord-V15 commented 2 years ago

For anyone in the future who's gonna try this repo, let me give you an easy way out. I spent a lot of time on Speaker Recognition and the official docs say something and the samples given do something else. I will use the official REST API Docs for the text-independent method.

FOR ONCE AND FOR ALL, IF YOU WANT TO USE PYTHON, THIS IS THE EASIEST AND BEST WAY I FOUND :

CREATE PROFILE πŸ‘‡

import http.client
import json

conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')

headers = {'Ocp-Apim-Subscription-Key': 'YOUR_KEY',
    'Content-type': 'application/json'}

foo = {'locale': 'en-us'}
json_data = json.dumps(foo)

conn.request(method='POST', url='/speaker/verification/v2.0/text-independent/profiles', 
             body=json_data, headers=headers)
response = conn.getresponse().read().decode()
conn.close()

print(response)

ENROLL PROFILE πŸ‘‡

conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')

headers = {'Ocp-Apim-Subscription-Key': 'YOUR_KEY'}

with open('YOUR_WAV_FILE', 'rb') as data:
    conn.request(method='POST', 
                url=f'/speaker/verification/v2.0/text-independent/profiles/{profileId}/enrollments?ignoreMinLength=true', 
                body=data, headers=headers)
    response = conn.getresponse().read().decode()
    conn.close()
    print(response)

LIST PROFILES πŸ‘‡

conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')

headers = {'Ocp-Apim-Subscription-Key': 'YOUR_KEY'}

conn.request(method='GET', url='/speaker/verification/v2.0/text-independent/profiles?$top=10', 
             headers=headers)
response = conn.getresponse().read().decode()
conn.close()
print(response)

And you can now make your own calls using this method and following the docs link I mentioned above.

Cheers and Good Luck ! 🍻

shubham12tomar commented 2 years ago

Hi @Lord-V15 , can you please provide identification code after the enrollments of the voice or audio.

Lord-V15 commented 2 years ago

Hi @Lord-V15 , can you please provide identification code after the enrollments of the voice or audio.

Not sure what you're asking for but you can easily access the after-enrolment stuff from inside the response. For example, if you run this :

conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')

headers = {'Ocp-Apim-Subscription-Key': 'HIDING-THIS-PART-FOR-OBVIOUS-REASONS'}

with open('phrase-record1.wav', 'rb') as data:
    conn.request(method='POST', 
                url=f'/speaker/verification/v2.0/text-dependent/profiles/{profileId}/enrollments?ignoreMinLength=true', 
                body=data, headers=headers) # profileId comes from the response of create profile request
    response = conn.getresponse().read().decode()
    conn.close()
print(response)

You will get a response like :

{"remainingEnrollmentsCount":2,"passPhrase":"my voice is stronger than passwords","profileId":"506b52f6-8fe9-44b9-a420-be200880e2a1","enrollmentStatus":"Enrolling","enrollmentsCount":1,"enrollmentsLength":4.21,"enrollmentsSpeechLength":2.87,"audioLength":4.21,"audioSpeechLength":2.87}

I think what you're looking for is the profileId part. If you want to know if the enrolment was successful, you can store the POST call like req = conn.request( ... ) and then check the response using if req.status_code==200 : Hope this will help you in one way or another.

davisokuz1 commented 2 years ago

This is a long shot but any chance you would be interested in creating the last part which is being able to verify the voice profiles. However I am trying to do this through microphone from my laptop. Speaker Identification instead of verification.

Lord-V15 commented 2 years ago

@davisokuz1 I think I do have my project code for that, but I'll create a separate gist and share here once I'm free from work lol.

Lord-V15 commented 2 years ago

@davisokuz1 sorry work kept me busy. Here is what you needed I think. Happy Weekend !

VERIFY PROFILE

So first off you need to have your Profile ID from enrolment or you can just list them all and take the one required, it'll be something like profileId="364f6533-77fc-4120-b7b9-e163439e69cc" (Edit : And obviously you need to have a voice recording, which was a WAV file in my case) Then you can run the following :

conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')

headers = {'Ocp-Apim-Subscription-Key': 'f92e7479ee9f4e1ca9ba46c32e5644a0'}

with open('verify.wav', 'rb') as data:
    conn.request(method='POST', 
                url=f'/speaker/verification/v2.0/text-independent/profiles/{profileId}/verify', 
                body=data, headers=headers)
    response = conn.getresponse().read().decode()
    conn.close()
    print(response)

and the response will be : {"recognitionResult":"Accept","score":0.6504233479499817} which you can use to handle various situations. NOTE : THE RESULT IS JUST A SIMPLE <50 OR >50 BY DEFAULT. I'D RECOMMEND USING YOUR OWN CONDITIONS TO JUDGE THE SCORE, e.g. I found for my case, 65+ was the best case as I couldn't risk a false positive.

BONUS : RESET PROFILE (Wasn't useful for my project but here you go)

conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')

headers = {'Ocp-Apim-Subscription-Key': 'f92e7479ee9f4e1ca9ba46c32e5644a0'}

conn.request(method='DELETE', url=f'/speaker/verification/v2.0/text-independent/profiles/{profileId}', 
             headers=headers)
response = conn.getresponse().read().decode()
conn.close()
print(response)

Edit 2 : Maybe this Jupyter will also be useful : Text Dependent Recognition

keshavbhandari commented 2 years ago

@Lord-V15 Given an audio file, do you know how to identify a profile among a bunch of profiles? Any help would be much appreciated!!