This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
I have tested with my audio file for speaker Diarization which is not accurate. i have attached audio file(speaker_tag issue.wav) and my python code.
Is there any problem with my python code or audio file?
To Reproduce
This is my python code for speaker diarization.
from google.cloud import speech_v1p1beta1 as speech
from google.oauth2 import service_account
import os
client = speech.SpeechClient(credentials=service_account.Credentials.from_service_account_file(os.getenv("GOOGLE_APPLICATION_CREDENTIALS")))
#audio = speech.types.RecognitionAudio(content=content)
audio = speech.types.RecognitionAudio(uri = 'STORAGE_AUDIO_URL')
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=48000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
operation = client.long_running_recognize(config, audio)
response = operation.result(timeout=1000)
result = response.results[-1]
words_info = result.alternatives[0].words
# Printing out the output:
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word,
word_info.speaker_tag))
Describe the bug
I have tested with my audio file for speaker Diarization which is not accurate. i have attached audio file(speaker_tag issue.wav) and my python code. Is there any problem with my python code or audio file?
To Reproduce
This is my python code for speaker diarization.
Data samples
Audio file google drive link here
Above audio file Output:- word: 'he', speaker_tag: 2 word: 'sighed', speaker_tag: 2 word: 'what', speaker_tag: 2 word: 'brings', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'in', speaker_tag: 2 word: 'today', speaker_tag: 2 word: 'I', speaker_tag: 2 word: 'have', speaker_tag: 2 word: 'a', speaker_tag: 2 word: 'really', speaker_tag: 2 word: 'severe', speaker_tag: 2 word: 'cough', speaker_tag: 2 word: 'really', speaker_tag: 2 word: 'severe', speaker_tag: 2 word: 'headache', speaker_tag: 2 word: 'and', speaker_tag: 2 word: 'my', speaker_tag: 1 word: 'throat', speaker_tag: 2 word: 'really', speaker_tag: 2 word: 'itchy', speaker_tag: 2 word: 'okay', speaker_tag: 2 word: 'let', speaker_tag: 2 word: 'me', speaker_tag: 2 word: 'check', speaker_tag: 2 word: 'seems', speaker_tag: 2 word: 'like', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'have', speaker_tag: 2 word: 'one', speaker_tag: 2 word: 'or', speaker_tag: 2 word: 'two', speaker_tag: 2 word: 'temperature', speaker_tag: 2 word: 'to', speaker_tag: 2 word: 'did', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'take', speaker_tag: 2 word: 'any', speaker_tag: 2 word: 'medication', speaker_tag: 2 word: 'what', speaker_tag: 2 word: 'dosage', speaker_tag: 2 word: 'will', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'take', speaker_tag: 2 word: 'in', speaker_tag: 2 word: 'animal', speaker_tag: 1 word: 'okay', speaker_tag: 1 word: 'let', speaker_tag: 2 word: 'me', speaker_tag: 2 word: 'take', speaker_tag: 2 word: 'a', speaker_tag: 2 word: 'look', speaker_tag: 2 word: 'at', speaker_tag: 2 word: 'it', speaker_tag: 2 word: 'it's', speaker_tag: 2 word: 'like', speaker_tag: 2 word: 'a', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'got', speaker_tag: 2 word: 'a', speaker_tag: 2 word: 'flu', speaker_tag: 2 word: 'did', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'take', speaker_tag: 2 word: 'your', speaker_tag: 2 word: 'flu', speaker_tag: 2 word: 'shot', speaker_tag: 2 word: 'so', speaker_tag: 2 word: 'the', speaker_tag: 2 word: 'intensity', speaker_tag: 2 word: 'might', speaker_tag: 2 word: 'be', speaker_tag: 2 word: 'low', speaker_tag: 2 word: 'why', speaker_tag: 2 word: 'don't', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'continue', speaker_tag: 2 word: 'taking', speaker_tag: 2 word: 'your', speaker_tag: 2 word: 'Tylenol', speaker_tag: 2 word: 'for', speaker_tag: 2 word: 'your', speaker_tag: 2 word: 'draw', speaker_tag: 2 word: 'temperature', speaker_tag: 2 word: 'in', speaker_tag: 2 word: 'your', speaker_tag: 2 word: 'headache', speaker_tag: 2 word: 'and', speaker_tag: 2 word: 'write', speaker_tag: 2 word: 'some', speaker_tag: 2 word: 'cough', speaker_tag: 2 word: 'syrup', speaker_tag: 2 word: 'so', speaker_tag: 2 word: 'if', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'can', speaker_tag: 2 word: 'get', speaker_tag: 2 word: 'it', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'can', speaker_tag: 2 word: 'get', speaker_tag: 2 word: 'it', speaker_tag: 2 word: 'in', speaker_tag: 2 word: 'the', speaker_tag: 2 word: 'pharmacy', speaker_tag: 2 word: 'thank', speaker_tag: 2 word: 'you', speaker_tag: 2
The above ouput is not accurate with audio file. All words are showing speaker tag as 2. Please check audio file with output.
Versions
google-cloud-speech==0.36.0