googleapis / python-speech

This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-speech
Apache License 2.0
358 stars 209 forks source link

AttributeError: Extensions, google.api_core.exceptions.GoogleAPICallError: None Too many retries, giving up. #52

Closed kylefoley76 closed 4 years ago

kylefoley76 commented 4 years ago

Google-cloud-speech, version 1.3.2 Python 3.8 Mac OS 10.14.4

I'm able to use this software and get it working when the files are less than roughly 4 megs. Anything larger than that I have problems. I have added the timeout keyword but that does not work. I'm not sure if you can access the exact file I use on my google cloud, I'm pretty sure you can. In any case, the name of the file is trial_1.mp3. The file in question can be found here https://drive.google.com/file/d/1GFoA3ukqZVcJwKYvGqjXD8bMYbc1Tk2n/view?usp=sharing There are two lines of codes which give back different error messages encoding = enums.RecognitionConfig.AudioEncoding.FLAC

gives back the error message GoogleAPICallError

And this line of code: encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16 gives back the error message: AttributeError: Extensions

class audio2txt:
    def __init__(self):
        str1 = '/users/kylefoley/codes/' + "My Project 999999.json"
        os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = str1
        self.all_txt = []
        self.loop_audio2txt()

    def loop_audio2txt(self):
        str1 = f'/users/kylefoley/downloads/audio/'
        self.files = os.listdir(str1)
        self.files.sort()
        for e, file in en(self.files):
            if file[0] != '.':
                p(f'{e} of {len(self.files)}')
                self.file = file
                self.local = f'{str1}{file}'
                self.storage_uri = f'gs://deduction4/audio/{file}'
                self.conver2raw()
                self.upload2cloud(1)
                self.sample_recognize()

    def conver2raw(self):
        sound = AudioSegment.from_mp3(self.local)
        self.content = sound.raw_data
        self.duration = int(sound.duration_seconds)
        return

    def upload2cloud(self, already_uploaded=0):
        storage_client = storage.Client()
        bucket_name = f'deduction4'
        bucket = storage_client.get_bucket(bucket_name)
        if not already_uploaded:
            bucket.blob(f'audio/{self.file}').upload_from_string(self.content, timeout=240)
        self.storage_uri = f'gs://deduction4/audio/{self.file}'
        self.bucket = bucket
        return

    def sample_recognize(self):
        client = speech_v1.SpeechClient()
        storage_uri = self.storage_uri
        sample_rate_hertz = 16000
        language_code = "en-US"
        encoding = enums.RecognitionConfig.AudioEncoding.FLAC
        #encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16
        config = {
            "sample_rate_hertz": sample_rate_hertz,
            "language_code": language_code,
            "encoding": encoding,
        }
        audio = {"uri": storage_uri}
        operation = client.long_running_recognize(config, audio, timeout=240)
        print(u"Waiting for operation to complete...")
        response = operation.result()
        for result in response.results:
            alternative = result.alternatives[0]
            self.all_txt.append(alternative)
        time.sleep(2)
kylefoley76 commented 4 years ago

I forgot to put the traceback

Traceback (most recent call last):
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydevd_bundle/pydevd_resolver.py", line 178, in _getPyDictionary
    attr = getattr(var, n)
AttributeError: Extensions

Traceback (most recent call last):
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1434, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 287, in <module>
    audio2txt()
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 22, in __init__
    self.loop_audio2txt()
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 55, in loop_audio2txt
    self.sample_recognize()
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 88, in sample_recognize
    response = operation.result()
  File "/Users/kylefoley/codes/venv/lib/python3.8/site-packages/google/api_core/future/polling.py", line 130, in result
    raise self._exception
google.api_core.exceptions.GoogleAPICallError: None Too many retries, giving up.
munkhuushmgl commented 4 years ago

@kylefoley76 there was a breaking change that related to enums and types are no longer supported google python repos.

kylefoley76 commented 4 years ago

I don't understand what you mean. The example code still uses enums found here https://googleapis.dev/python/speech/latest/index.html

from google.cloud import speech_v1
from google.cloud.speech_v1 import enums

client = speech_v1.SpeechClient()

encoding = enums.RecognitionConfig.AudioEncoding.FLAC
sample_rate_hertz = 44100
language_code = 'en-US'
config = {'encoding': encoding, 'sample_rate_hertz': sample_rate_hertz, 'language_code': language_code}
uri = 'gs://bucket_name/file_name.flac'
audio = {'uri': uri}

response = client.recognize(config, audio)

Are you saying that this line of code is incorrect?

encoding = enums.RecognitionConfig.AudioEncoding.FLAC

If so, what is the correct line of code then?

b-loved-dreamer commented 4 years ago

Hi @kylefoley76 ,

Have you tried using this sample for long_running_recognize?

kylefoley76 commented 4 years ago

That doesn't work. I still get

google.api_core.exceptions.InvalidArgument: 400 Request payload size exceeds the limit: 10485760 bytes

I'm supposed to upload it to the cloud but they do not show what syntax to use when you upload it to the cloud. With the following code

        client = speech_v1.SpeechClient()
        storage_uri = self.storage_uri
        sample_rate_hertz = 16000
        language_code = "en-US"
        encoding = enums.RecognitionConfig.AudioEncoding.FLAC
        #encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16
        config = {
            "sample_rate_hertz": sample_rate_hertz,
            "language_code": language_code,
            "encoding": encoding,
        }
        audio = {"uri": storage_uri}
        operation = client.long_running_recognize(config, audio, timeout=240)
        print(u"Waiting for operation to complete...")
        self.response = operation.result()

The audio object is a dictionary but when you use the code that you mentioned the dictionary object is not acceptable but requires a bytes object. I cannot figure out to get a bytes object from the file uploaded to the cloud.

For example in this code that you pointed to

        client = speech.SpeechClient()

        # with io.open(self.local, 'rb') as audio_file:
        #     content = audio_file.read()
        content = {"uri": self.storage_uri}

        audio = types.RecognitionAudio(content=content)
        config = types.RecognitionConfig(
            encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
            sample_rate_hertz=16000,
            language_code='en-US')

The content object has to be located on the cloud. So I upload to the cloud successfully as follows:

        self.storage_uri = f'gs://deduction4/audio/{self.file}'
        storage_client = storage.Client()
        bucket_name = f'deduction4'
        bucket = storage_client.get_bucket(bucket_name)
        if not already_uploaded:
            bucket.blob(f'audio/{self.file}').upload_from_string(self.content, timeout=240)
        self.storage_uri = f'gs://deduction4/audio/{self.file}'
        self.bucket = bucket

But now how do I get the content object to be a bytes object? Otherwise I get this error

TypeError: {'uri': 'gs://deduction4/audio/trial_1.mp3'} has type dict, but expected one of: bytes
b-loved-dreamer commented 4 years ago

Hi @kylefoley76

We can only provide support to speech-to-text related issues. Cloud Storage issues should be directed to the appropriate team. They should be able to answer your question about converting to a bytes object.

kylefoley76 commented 4 years ago

Yes, but speech to text requires me to upload large files to the cloud. So after I have uploaded it to the cloud why does speech to text stop working? The following code is supposed to work. I'm not getting an error with the upload part I'm getting an error with the speech to text part

        client = speech_v1.SpeechClient()
        storage_uri = self.storage_uri
        sample_rate_hertz = 16000
        language_code = "en-US"
        encoding = enums.RecognitionConfig.AudioEncoding.FLAC
        #encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16
        config = {
            "sample_rate_hertz": sample_rate_hertz,
            "language_code": language_code,
            "encoding": encoding,
        }
        audio = {"uri": storage_uri}
        operation = client.long_running_recognize(config, audio, timeout=240)

By the way, can you just go ahead and try to transcribe a 30 minute audio and see if it works? If you could transcribe a 30 minute audio and send me the audio and the code then maybe I could figure out what I'm doing wrong.

kylefoley76 commented 4 years ago

I need to know if this problem cannot be solved. If it cannot be solved then I'll try a work around. I'll have to split the audio files into smaller pieces. I don't want to do that because some of the words will be split and hence transcribed incorrectly but I at least need to know if you will never answer my question.

b-loved-dreamer commented 4 years ago

Hi @kylefoley76

MP3 encoding is a Beta feature and only available in v1p1beta1. You can check this sample that uses the v1p1beta1 version of the API.

Like so:

from google.cloud import speech_v1p1beta1 from google.cloud.speech_v1p1beta1 import enums

client = speech_v1p1beta1.SpeechClient()

encoding = enums.RecognitionConfig.AudioEncoding.MP3 sample_rate_hertz = 44100 language_code = 'en-US'

config = {'encoding': encoding, 'sample_rate_hertz': sample_rate_hertz, 'language_code': language_code}

uri = 'gs://your-bucket/file.mp3' audio = {'uri': uri}

print('Waiting on reponse...') operation = client.long_running_recognize(config, audio)

response = operation.result()

for result in response.results:

The first alternative is the most likely one for this portion.

print(u'Transcript: {}'.format(result.alternatives[0].transcript)) print('Confidence: {}'.format(result.alternatives[0].confidence))

Your file metadata:

Screen Shot 2020-09-16 at 10 43 04 AM
kylefoley76 commented 4 years ago

I got the following error

Traceback (most recent call last):
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 332, in <module>
    audio2txt()
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 27, in __init__
    self.loop_audio2txt()
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 61, in loop_audio2txt
    self.sample_recognize()
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 110, in sample_recognize
    self.response = operation.result()
  File "/Users/kylefoley/codes/venv/lib/python3.8/site-packages/google/api_core/future/polling.py", line 130, in result
    raise self._exception
google.api_core.exceptions.GoogleAPICallError: None Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.
kylefoley76 commented 4 years ago

If mp3 encoding is only available in such and such, then maybe I should just convert the audio into some other form of encoding? What do you recommend?

b-loved-dreamer commented 4 years ago

Hi @kylefoley76

I am able to transcribe your file using the code referenced above. You can try setting audio_channel_count and enable_separate_recognition_per_channel in your configuration object:

config = {'encoding': encoding, 'sample_rate_hertz': sample_rate_hertz, 'language_code': language_code, 'audio_channel_count': 2, 'enable_separate_recognition_per_channel': True}

You could also convert your files to WAV and use the non-beta version of the API.

kylefoley76 commented 4 years ago

I got the following error

Traceback (most recent call last):
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 334, in <module>
    audio2txt()
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 27, in __init__
    self.loop_audio2txt()
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 61, in loop_audio2txt
    self.sample_recognize()
  File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 112, in sample_recognize
    self.response = operation.result()
  File "/Users/kylefoley/codes/venv/lib/python3.8/site-packages/google/api_core/future/polling.py", line 130, in result
    raise self._exception
google.api_core.exceptions.GoogleAPICallError: None Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.

Maybe the problem is when I convert the mp3 into a raw file here. self.local is the local file of the audio. I then upload that to the cloud.

from pydub import AudioSegment

        sound = AudioSegment.from_mp3(self.local)
        self.content = sound.raw_data
        self.duration = int(sound.duration_seconds)
b-loved-dreamer commented 4 years ago

Can you share the code that generated this error? I was successful in transcribing your original MP3 file with the code I referenced in my previous reply.

kylefoley76 commented 4 years ago

It would be the code I posted here https://github.com/googleapis/python-speech/issues/52#issue-693039016 except that the sample_recognize function is changed to

    def sample_recognize(self):
        client = speech_v1p1beta1.SpeechClient()
        encoding = enums.RecognitionConfig.AudioEncoding.MP3
        sample_rate_hertz = 44100
        language_code = 'en-US'
        config = {'encoding': encoding,
        'sample_rate_hertz': sample_rate_hertz,
        'audio_channel_count':2,
        'enable_separate_recognition_per_channel':True,
        'language_code': language_code}
        audio = {'uri': self.storage_uri}
        print('Waiting on reponse...')
        operation = client.long_running_recognize(config, audio)
        self.response = operation.result()

How do you convert the mp3 to raw? Could you share your code along with the conversion to raw?

kylefoley76 commented 4 years ago

Are you also sure that there isn't some config that I have to tell Google about regarding my profile?

b-loved-dreamer commented 4 years ago

@kylefoley76 you do not need to convert to another encoding. You can keep your MP3 format file and send the request that way. You will get a response (transcription) from the service using the code you just shared with the original MP3 file.

We can only support issues related to our service. File conversion falls outside of our domain. However, you are free to decide on how you want to send your transcription request, including file format, etc.

kylefoley76 commented 4 years ago

I answered how I convert it to raw with comment https://github.com/googleapis/python-speech/issues/52#issuecomment-694366892

I convert it to raw in the following way:

from pydub import AudioSegment

        sound = AudioSegment.from_mp3(self.local)
        self.content = sound.raw_data
        self.duration = int(sound.duration_seconds)

How do you do it?

b-loved-dreamer commented 4 years ago

@kylefoley76 check this documentation to see how to perform file conversion.

kylefoley76 commented 4 years ago

That documentation is just a general document about converting speech to text. It does not target my specific problem. I don't have time to read it. Could you just share with me the syntax you used to convert my video to raw? The syntax I use to convert audio files to raw works for small smiles so I seriously doubt that my problem lies with the conversion to raw. In any case, you got it to work. So can you just show me the complete code you used to get it to work. If the code worked on your computer then it should work on my computer. Better yet, can you prove to me that you got it to work by uploading the output? It's just a summary of Kafka's Trial so there are no privacy concerns here.

telpirion commented 4 years ago

@kylefoley76, if I understand your concern correctly, you would like to know how we demux (extract) the audio track from a video file?

We advocate the use of FFmpg for manipulating audio files, as described in our documentation here and here. That last link shows the syntax for using ffmpg for extracting audio tracks:

ffmpeg -i video-input-file audio-output-file

Another option available for you for transcribing videos is the Video Intelligence API.

Please reopen this issue if you encounter further issues with the Speech-to-Text API.

kylefoley76 commented 4 years ago

@telpirion No, I'm not trying to extract an audio file from a video file. In my code it clearly states that I'm working with an mp3 file, here:

sound = AudioSegment.from_mp3(self.local)

Would kindly reopen my issue and provide some guidance on this matter. I have been trying to solve this problem for 14 days and each recommendation so far as failed. I think you can understand why I'm feeling frustrated.

kylefoley76 commented 4 years ago

How do you reopen this issue?

telpirion commented 4 years ago

FYI @JustinBeckwith

@kylefoley76 , I can see that you're getting frustrated. We have provided to you all of the guidance that we can.

I mentioned the FFmpg tool previously; you can also use this for converting from MP3 to raw (as you mentioned here ). The ffmpg tool is how we recommend converting audio files (per all the links I sent in my previous response).

Looking at your code, I see that you are using pydub, which is not owned or maintained by Google. If you are encountering troubles with that library, I suggest that you follow up on the GitHub repo for pydub.

JustinBeckwith commented 4 years ago

Just to come back around here, we try to keep this repository limited to bug reports, and feature requests. For questions on usage or general Python questions, I'd suggest taking things over to https://stackoverflow.com/. I'm going to close this out for now. If you run into any bugs, please do let us know!