google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.55k stars 320 forks source link

[Invalid][Cloud] Speaker tag is not accurate #34

Closed balavenkatesh3322 closed 5 years ago

balavenkatesh3322 commented 5 years ago

Describe the bug

I have tested with my audio file for speaker Diarization which is not accurate. i have attached audio file(speaker_tag issue.wav) and my python code. Is there any problem with my python code or audio file?

To Reproduce

This is my python code for speaker diarization.

from google.cloud import speech_v1p1beta1 as speech
from google.oauth2 import service_account
import os
client = speech.SpeechClient(credentials=service_account.Credentials.from_service_account_file(os.getenv("GOOGLE_APPLICATION_CREDENTIALS")))

#audio = speech.types.RecognitionAudio(content=content)

audio = speech.types.RecognitionAudio(uri = 'STORAGE_AUDIO_URL')

config = speech.types.RecognitionConfig(
    encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=48000,
    language_code='en-US',
    enable_speaker_diarization=True,
    diarization_speaker_count=2)

operation = client.long_running_recognize(config, audio)

response = operation.result(timeout=1000)

result = response.results[-1]

words_info = result.alternatives[0].words

# Printing out the output:
for word_info in words_info:
    print("word: '{}', speaker_tag: {}".format(word_info.word,
                                               word_info.speaker_tag))

Data samples

Audio file google drive link here

Above audio file Output:- word: 'he', speaker_tag: 2 word: 'sighed', speaker_tag: 2 word: 'what', speaker_tag: 2 word: 'brings', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'in', speaker_tag: 2 word: 'today', speaker_tag: 2 word: 'I', speaker_tag: 2 word: 'have', speaker_tag: 2 word: 'a', speaker_tag: 2 word: 'really', speaker_tag: 2 word: 'severe', speaker_tag: 2 word: 'cough', speaker_tag: 2 word: 'really', speaker_tag: 2 word: 'severe', speaker_tag: 2 word: 'headache', speaker_tag: 2 word: 'and', speaker_tag: 2 word: 'my', speaker_tag: 1 word: 'throat', speaker_tag: 2 word: 'really', speaker_tag: 2 word: 'itchy', speaker_tag: 2 word: 'okay', speaker_tag: 2 word: 'let', speaker_tag: 2 word: 'me', speaker_tag: 2 word: 'check', speaker_tag: 2 word: 'seems', speaker_tag: 2 word: 'like', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'have', speaker_tag: 2 word: 'one', speaker_tag: 2 word: 'or', speaker_tag: 2 word: 'two', speaker_tag: 2 word: 'temperature', speaker_tag: 2 word: 'to', speaker_tag: 2 word: 'did', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'take', speaker_tag: 2 word: 'any', speaker_tag: 2 word: 'medication', speaker_tag: 2 word: 'what', speaker_tag: 2 word: 'dosage', speaker_tag: 2 word: 'will', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'take', speaker_tag: 2 word: 'in', speaker_tag: 2 word: 'animal', speaker_tag: 1 word: 'okay', speaker_tag: 1 word: 'let', speaker_tag: 2 word: 'me', speaker_tag: 2 word: 'take', speaker_tag: 2 word: 'a', speaker_tag: 2 word: 'look', speaker_tag: 2 word: 'at', speaker_tag: 2 word: 'it', speaker_tag: 2 word: 'it's', speaker_tag: 2 word: 'like', speaker_tag: 2 word: 'a', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'got', speaker_tag: 2 word: 'a', speaker_tag: 2 word: 'flu', speaker_tag: 2 word: 'did', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'take', speaker_tag: 2 word: 'your', speaker_tag: 2 word: 'flu', speaker_tag: 2 word: 'shot', speaker_tag: 2 word: 'so', speaker_tag: 2 word: 'the', speaker_tag: 2 word: 'intensity', speaker_tag: 2 word: 'might', speaker_tag: 2 word: 'be', speaker_tag: 2 word: 'low', speaker_tag: 2 word: 'why', speaker_tag: 2 word: 'don't', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'continue', speaker_tag: 2 word: 'taking', speaker_tag: 2 word: 'your', speaker_tag: 2 word: 'Tylenol', speaker_tag: 2 word: 'for', speaker_tag: 2 word: 'your', speaker_tag: 2 word: 'draw', speaker_tag: 2 word: 'temperature', speaker_tag: 2 word: 'in', speaker_tag: 2 word: 'your', speaker_tag: 2 word: 'headache', speaker_tag: 2 word: 'and', speaker_tag: 2 word: 'write', speaker_tag: 2 word: 'some', speaker_tag: 2 word: 'cough', speaker_tag: 2 word: 'syrup', speaker_tag: 2 word: 'so', speaker_tag: 2 word: 'if', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'can', speaker_tag: 2 word: 'get', speaker_tag: 2 word: 'it', speaker_tag: 2 word: 'you', speaker_tag: 2 word: 'can', speaker_tag: 2 word: 'get', speaker_tag: 2 word: 'it', speaker_tag: 2 word: 'in', speaker_tag: 2 word: 'the', speaker_tag: 2 word: 'pharmacy', speaker_tag: 2 word: 'thank', speaker_tag: 2 word: 'you', speaker_tag: 2

The above ouput is not accurate with audio file. All words are showing speaker tag as 2. Please check audio file with output.

Versions

google-cloud-speech==0.36.0

wq2012 commented 5 years ago

This question is about Google Cloud diarization API, which is completely unrelated to UIS-RNN.

Please contact the customer service.