Closed kylefoley76 closed 4 years ago
I forgot to put the traceback
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydevd_bundle/pydevd_resolver.py", line 178, in _getPyDictionary
attr = getattr(var, n)
AttributeError: Extensions
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1434, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 287, in <module>
audio2txt()
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 22, in __init__
self.loop_audio2txt()
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 55, in loop_audio2txt
self.sample_recognize()
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 88, in sample_recognize
response = operation.result()
File "/Users/kylefoley/codes/venv/lib/python3.8/site-packages/google/api_core/future/polling.py", line 130, in result
raise self._exception
google.api_core.exceptions.GoogleAPICallError: None Too many retries, giving up.
@kylefoley76 there was a breaking change that related to enums and types are no longer supported google python repos.
I don't understand what you mean. The example code still uses enums
found here
https://googleapis.dev/python/speech/latest/index.html
from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
client = speech_v1.SpeechClient()
encoding = enums.RecognitionConfig.AudioEncoding.FLAC
sample_rate_hertz = 44100
language_code = 'en-US'
config = {'encoding': encoding, 'sample_rate_hertz': sample_rate_hertz, 'language_code': language_code}
uri = 'gs://bucket_name/file_name.flac'
audio = {'uri': uri}
response = client.recognize(config, audio)
Are you saying that this line of code is incorrect?
encoding = enums.RecognitionConfig.AudioEncoding.FLAC
If so, what is the correct line of code then?
Hi @kylefoley76 ,
Have you tried using this sample for long_running_recognize
?
That doesn't work. I still get
google.api_core.exceptions.InvalidArgument: 400 Request payload size exceeds the limit: 10485760 bytes
I'm supposed to upload it to the cloud but they do not show what syntax to use when you upload it to the cloud. With the following code
client = speech_v1.SpeechClient()
storage_uri = self.storage_uri
sample_rate_hertz = 16000
language_code = "en-US"
encoding = enums.RecognitionConfig.AudioEncoding.FLAC
#encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16
config = {
"sample_rate_hertz": sample_rate_hertz,
"language_code": language_code,
"encoding": encoding,
}
audio = {"uri": storage_uri}
operation = client.long_running_recognize(config, audio, timeout=240)
print(u"Waiting for operation to complete...")
self.response = operation.result()
The audio object is a dictionary but when you use the code that you mentioned the dictionary object is not acceptable but requires a bytes object. I cannot figure out to get a bytes object from the file uploaded to the cloud.
For example in this code that you pointed to
client = speech.SpeechClient()
# with io.open(self.local, 'rb') as audio_file:
# content = audio_file.read()
content = {"uri": self.storage_uri}
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
The content object has to be located on the cloud. So I upload to the cloud successfully as follows:
self.storage_uri = f'gs://deduction4/audio/{self.file}'
storage_client = storage.Client()
bucket_name = f'deduction4'
bucket = storage_client.get_bucket(bucket_name)
if not already_uploaded:
bucket.blob(f'audio/{self.file}').upload_from_string(self.content, timeout=240)
self.storage_uri = f'gs://deduction4/audio/{self.file}'
self.bucket = bucket
But now how do I get the content object to be a bytes object? Otherwise I get this error
TypeError: {'uri': 'gs://deduction4/audio/trial_1.mp3'} has type dict, but expected one of: bytes
Hi @kylefoley76
We can only provide support to speech-to-text related issues. Cloud Storage issues should be directed to the appropriate team. They should be able to answer your question about converting to a bytes object.
Yes, but speech to text requires me to upload large files to the cloud. So after I have uploaded it to the cloud why does speech to text stop working? The following code is supposed to work. I'm not getting an error with the upload part I'm getting an error with the speech to text part
client = speech_v1.SpeechClient()
storage_uri = self.storage_uri
sample_rate_hertz = 16000
language_code = "en-US"
encoding = enums.RecognitionConfig.AudioEncoding.FLAC
#encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16
config = {
"sample_rate_hertz": sample_rate_hertz,
"language_code": language_code,
"encoding": encoding,
}
audio = {"uri": storage_uri}
operation = client.long_running_recognize(config, audio, timeout=240)
By the way, can you just go ahead and try to transcribe a 30 minute audio and see if it works? If you could transcribe a 30 minute audio and send me the audio and the code then maybe I could figure out what I'm doing wrong.
I need to know if this problem cannot be solved. If it cannot be solved then I'll try a work around. I'll have to split the audio files into smaller pieces. I don't want to do that because some of the words will be split and hence transcribed incorrectly but I at least need to know if you will never answer my question.
Hi @kylefoley76
MP3 encoding is a Beta feature and only available in v1p1beta1. You can check this sample that uses the v1p1beta1 version of the API.
Like so:
from google.cloud import speech_v1p1beta1 from google.cloud.speech_v1p1beta1 import enums
client = speech_v1p1beta1.SpeechClient()
encoding = enums.RecognitionConfig.AudioEncoding.MP3 sample_rate_hertz = 44100 language_code = 'en-US'
config = {'encoding': encoding, 'sample_rate_hertz': sample_rate_hertz, 'language_code': language_code}
uri = 'gs://your-bucket/file.mp3' audio = {'uri': uri}
print('Waiting on reponse...') operation = client.long_running_recognize(config, audio)
response = operation.result()
for result in response.results:
The first alternative is the most likely one for this portion.
print(u'Transcript: {}'.format(result.alternatives[0].transcript)) print('Confidence: {}'.format(result.alternatives[0].confidence))
Your file metadata:
I got the following error
Traceback (most recent call last):
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 332, in <module>
audio2txt()
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 27, in __init__
self.loop_audio2txt()
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 61, in loop_audio2txt
self.sample_recognize()
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 110, in sample_recognize
self.response = operation.result()
File "/Users/kylefoley/codes/venv/lib/python3.8/site-packages/google/api_core/future/polling.py", line 130, in result
raise self._exception
google.api_core.exceptions.GoogleAPICallError: None Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.
If mp3 encoding is only available in such and such, then maybe I should just convert the audio into some other form of encoding? What do you recommend?
Hi @kylefoley76
I am able to transcribe your file using the code referenced above. You can try setting audio_channel_count
and enable_separate_recognition_per_channel
in your configuration object:
config = {'encoding': encoding, 'sample_rate_hertz': sample_rate_hertz, 'language_code': language_code, 'audio_channel_count': 2, 'enable_separate_recognition_per_channel': True}
You could also convert your files to WAV and use the non-beta version of the API.
I got the following error
Traceback (most recent call last):
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 334, in <module>
audio2txt()
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 27, in __init__
self.loop_audio2txt()
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 61, in loop_audio2txt
self.sample_recognize()
File "/Users/kylefoley/codes/pcode/other/text2audio.py", line 112, in sample_recognize
self.response = operation.result()
File "/Users/kylefoley/codes/venv/lib/python3.8/site-packages/google/api_core/future/polling.py", line 130, in result
raise self._exception
google.api_core.exceptions.GoogleAPICallError: None Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.
Maybe the problem is when I convert the mp3 into a raw file here. self.local
is the local file of the audio. I then upload that to the cloud.
from pydub import AudioSegment
sound = AudioSegment.from_mp3(self.local)
self.content = sound.raw_data
self.duration = int(sound.duration_seconds)
Can you share the code that generated this error? I was successful in transcribing your original MP3 file with the code I referenced in my previous reply.
It would be the code I posted here
https://github.com/googleapis/python-speech/issues/52#issue-693039016
except that the sample_recognize
function is changed to
def sample_recognize(self):
client = speech_v1p1beta1.SpeechClient()
encoding = enums.RecognitionConfig.AudioEncoding.MP3
sample_rate_hertz = 44100
language_code = 'en-US'
config = {'encoding': encoding,
'sample_rate_hertz': sample_rate_hertz,
'audio_channel_count':2,
'enable_separate_recognition_per_channel':True,
'language_code': language_code}
audio = {'uri': self.storage_uri}
print('Waiting on reponse...')
operation = client.long_running_recognize(config, audio)
self.response = operation.result()
How do you convert the mp3
to raw? Could you share your code along with the conversion to raw?
Are you also sure that there isn't some config that I have to tell Google about regarding my profile?
@kylefoley76 you do not need to convert to another encoding. You can keep your MP3 format file and send the request that way. You will get a response (transcription) from the service using the code you just shared with the original MP3 file.
We can only support issues related to our service. File conversion falls outside of our domain. However, you are free to decide on how you want to send your transcription request, including file format, etc.
I answered how I convert it to raw with comment https://github.com/googleapis/python-speech/issues/52#issuecomment-694366892
I convert it to raw in the following way:
from pydub import AudioSegment
sound = AudioSegment.from_mp3(self.local)
self.content = sound.raw_data
self.duration = int(sound.duration_seconds)
How do you do it?
@kylefoley76 check this documentation to see how to perform file conversion.
That documentation is just a general document about converting speech to text. It does not target my specific problem. I don't have time to read it. Could you just share with me the syntax you used to convert my video to raw? The syntax I use to convert audio files to raw works for small smiles so I seriously doubt that my problem lies with the conversion to raw. In any case, you got it to work. So can you just show me the complete code you used to get it to work. If the code worked on your computer then it should work on my computer. Better yet, can you prove to me that you got it to work by uploading the output? It's just a summary of Kafka's Trial so there are no privacy concerns here.
@kylefoley76, if I understand your concern correctly, you would like to know how we demux (extract) the audio track from a video file?
We advocate the use of FFmpg for manipulating audio files, as described in our documentation here and here. That last link shows the syntax for using ffmpg
for extracting audio tracks:
ffmpeg -i video-input-file audio-output-file
Another option available for you for transcribing videos is the Video Intelligence API.
Please reopen this issue if you encounter further issues with the Speech-to-Text API.
@telpirion No, I'm not trying to extract an audio file from a video file. In my code it clearly states that I'm working with an mp3 file, here:
sound = AudioSegment.from_mp3(self.local)
Would kindly reopen my issue and provide some guidance on this matter. I have been trying to solve this problem for 14 days and each recommendation so far as failed. I think you can understand why I'm feeling frustrated.
How do you reopen this issue?
FYI @JustinBeckwith
@kylefoley76 , I can see that you're getting frustrated. We have provided to you all of the guidance that we can.
I mentioned the FFmpg tool previously; you can also use this for converting from MP3 to raw (as you mentioned here ). The ffmpg
tool is how we recommend converting audio files (per all the links I sent in my previous response).
Looking at your code, I see that you are using pydub, which is not owned or maintained by Google. If you are encountering troubles with that library, I suggest that you follow up on the GitHub repo for pydub.
Just to come back around here, we try to keep this repository limited to bug reports, and feature requests. For questions on usage or general Python questions, I'd suggest taking things over to https://stackoverflow.com/. I'm going to close this out for now. If you run into any bugs, please do let us know!
Google-cloud-speech, version 1.3.2 Python 3.8 Mac OS 10.14.4
I'm able to use this software and get it working when the files are less than roughly 4 megs. Anything larger than that I have problems. I have added the
timeout
keyword but that does not work. I'm not sure if you can access the exact file I use on my google cloud, I'm pretty sure you can. In any case, the name of the file istrial_1.mp3
. The file in question can be found here https://drive.google.com/file/d/1GFoA3ukqZVcJwKYvGqjXD8bMYbc1Tk2n/view?usp=sharing There are two lines of codes which give back different error messages encoding = enums.RecognitionConfig.AudioEncoding.FLACgives back the error message GoogleAPICallError
And this line of code: encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16 gives back the error message: AttributeError: Extensions