googleapis / google-cloud-php

Google Cloud Client Library for PHP
https://cloud.google.com/php/docs/reference
Apache License 2.0
1.09k stars 435 forks source link

Google Speech PHP API - Error reg. Audio Sampling Rate 'sampleRateHertz' #1034

Closed hypercms closed 6 years ago

hypercms commented 6 years ago

I created a FLAC audio file with 16.000 Hz sampling rate using FFMPEG and verified it as well. I tried it with and w/o the option 'sampleRateHertz' => 16000. In both cases I receive the same error. The audio sampling rate is included in the header of the FLAC file, so maybe the value can not be extracted from the file header by the Cloud Service?

I receive an error regarding the audio sampling rate, see: "error": { "code": 400, "message": "sample_rate_hertz (16000) in RecognitionConfig must either be unspecifie (truncated...)

dwsupplee commented 6 years ago

Thanks for the report @hypercms, is there any way you would be able to share the non truncated version of the message and also the message when you don't specify the sample rate at all?

hypercms commented 6 years ago

Here is the full response from the cloud service. I removed the files content to save space. The sampling rate has not been provided as an option. I hope it helps. Thank you!

dwsupplee commented 6 years ago

Thanks! I have the relevant information and will take a look at it soon. Just FYI I edited the post, in case there was any sensitive information in the log you shared.

dwsupplee commented 6 years ago

Looking at the log, it appears the sample rate was included in the request. Is it possible the provided value (16000) differs from the value in the header? That would cause the service to reject it.

It could be helpful to share the code snippet triggering the request, if you don't mind :).

hypercms commented 6 years ago

I didn't include the sample rate in the example I sent you. Maybe the Google Speech PHP API is reading the file header already before it posts the request (-> recognize)?

Here is the interesting part of the code:

` // authenticate putenv ("GOOGLE_APPLICATION_CREDENTIALS=".$mgmt_config['gs_access_json']);

// FLAC audio file

$path = "path_to_flac_file";

$result = array();

try
{    
  // instantiate a client
  $speech = new SpeechClient();

  // prepare the image to be annotated
  $data = fopen ($path, 'r');

  // audio file encoding and sample rate
  $options = [
      'languageCode' => 'en-US',
      'encoding' => 'FLAC',
      'enableWordTimeOffsets' => true
  ];

  // detect speech in the audio file
  $response = $speech->recognize ($data, $options);

  if ($response)
  {
    $i = 0;

    foreach ($response as $segment)
    {
      $alternative = $segment->alternatives()[0];

      $result[$i]['description'] = $alternative['transcript'];
      $result[$i]['confidence'] = $alternative['confidence'];

      // get start and end time
      $temp_start = array();
      $temp_end = array();

      // get start and end time of each word 
      foreach ($alternative['words'] as $wordInfo)
      {
        if (strlen ($wordInfo['word']) > 5) $temp_word = $wordInfo['word'];
        $temp_start[] = $wordInfo['startTime'];
        $temp_end[] = $wordInfo['endTime'];
      }

      if (sizeof ($temp_word) > 0) $result[$i]['keywords'] = implode (",", $temp_word);
      else $result[$i]['keywords'] = "";

      if (sizeof ($temp_start) > 0) $result[$i]['starttime'] = min ($temp_start);
      else $result[$i]['starttime'] = 0;

      if (sizeof ($temp_end) > 0) $result[$i]['endtime'] = max ($temp_end);
      else $result[$i]['endtime'] = 0;

      $i++;
    }
dwsupplee commented 6 years ago

Thanks!

Maybe the Google Speech PHP API is reading the file header already before it posts the request (-> recognize)?

That could definitely be what is happening. With an older version of the Google\Cloud\Speech\SpeechClient we attempted to detect some relevant information in the client code on your behalf. That functionality has since been removed, as it occurs upstream instead. Is it possible to upgrade to the latest version of the Speech library? I would be curious to see if that resolves any issues you are running into.

hypercms commented 6 years ago

I downloaded the version on the 27th of April 2018. I tried to update it now with Composer, but it seems that this is the latest version "nothing to install or update".

dwsupplee commented 6 years ago

Very interesting. Just to be certain, which version of the library is installed? You should be able to check by referencing this constant.

hypercms commented 6 years ago

We are using version 0.12.3 of Google Speech. I checked the sampling rate of the FLAC file as well, it is 16000 Hz. The system (hypercms.com) is using FFMPEG to read the metadata from the file. I am sure the information of the sampling rate from FFMPEG is correct since it also created the audio file from a video using the 16000 Hz as the sampling rate.

dwsupplee commented 6 years ago

Thanks for confirming. :)

I just ran the snippet provided on 0.12.3, using an audio file recorded using the following method:

// Sample rate: 16000, 16 bit, 1 channel audio
rec -r 16000 -b 16 -c 1 -t flac /audio/test.flac

And the request went through successfully. When inspecting the config from the request built by the library the sampleRateHertz setting was not provided, and was inferred by the upstream service:

{
  "languageCode": "en-US",
  "encoding": "FLAC",
  "enableWordTimeOffsets": true
}

I'm having a hard time understanding how the library is attempting to infer the sample rate for you, as we removed that functionality about a year ago. Also, the fact the message is truncated is odd, as that is a remnant from an issue we fixed some time back as well. The full message should read "sample_rate_hertz (16000) in RecognitionConfig must either be unspecified or match the value in the FLAC header ([SOME_VALUE])." Would you be able to run composer show and share the libraries/versions which are installed?

Also, it would be very helpful if you would be able to modify the instantiation of the SpeechClient to look like the following. It should help provide some valuable debug information for us. If you could shoot that over to me at dwsupplee@gmail.com we can get a better understanding of what may be occurring.

$guzzleClient = new \GuzzleHttp\Client();

$speech = new SpeechClient([
    'httpHandler' => function ($request, $options = []) use ($guzzleClient) {
        debug_print_backtrace(); // capture this output

        return $guzzleClient->send($request, $options);
    }
]);
hypercms commented 6 years ago

These are versions I downloaded with Composer: firebase/php-jwt v5.0.0 A simple library to encode and decod... google/auth v1.3.0 Google Auth Library for PHP google/cloud-core v1.19.2 Google Cloud PHP shared dependency, ... google/cloud-speech v0.12.3 Cloud Speech Client for PHP google/cloud-storage v1.4.0 Cloud Storage Client for PHP google/cloud-videointelligence v0.10.3 Cloud Video Intelligence Client for PHP google/cloud-vision v0.11.3 Cloud Vision Client for PHP google/gax 0.31.3 Google API Core for PHP google/proto-client 0.37.0 Generated proto and gRPC classes for... google/protobuf v3.5.2 proto library for PHP grpc/grpc 1.10.0 gRPC library for PHP guzzlehttp/guzzle 6.3.3 Guzzle is a PHP HTTP client library guzzlehttp/promises v1.3.1 Guzzle promises library guzzlehttp/psr7 1.4.2 PSR-7 message implementation that al... monolog/monolog 1.23.0 Sends your logs to files, sockets, i... psr/cache 1.0.1 Common interface for caching libraries psr/http-message 1.0.1 Common interface for HTTP messages psr/log 1.0.2 Common interface for logging libraries rize/uri-template 0.3.2 PHP URI Template (RFC 6570) supports...

I the meantime I updated cloud-core to version 1.20.0. I converted another MP4 video to FLAC with the same audio options and tested it. And it worked. So I assume the content of the previous converted FLAC file was the issue, although it played and displayed the correct audio options.

Now I tried a FLAC audio with several minutes of speech and received an error: (because only 1 minute is supported by the synchronous speech recognition) error: { code: 400, message: Sync input too long. For audio longer than 1 min use LongRunningRecognize with auriparameter., status: INVALID_ARGUMENT }

I found this example of an async. solution: https://cloud.google.com/speech-to-text/docs/async-recognize Question: Does this solution also support local files (according to the description, it doesn't) or should the local audio file be split in 1 minute segments in order to work the the sync. solution? Google Video Intelligence is not limited to 1 minute per file. Why does Google Speech has this limitation for files outside of the Google Cloud Storage? Thank you!

dwsupplee commented 6 years ago

Does this solution also support local files

Technically, yes - but it will still have the limitation of roughly 1 minute of audio.

should the local audio file be split in 1 minute segments in order to work the the sync

I would strongly advise uploading the data to a storage bucket and running the recognition against that. However, if using storage is out of the question, you could:

Why does Google Speech has this limitation for files outside of the Google Cloud Storage?

I'm not sure why the limits exist as they do, but a good place to ask would be the Cloud Speech Google Group.

hypercms commented 6 years ago

Thank you for the Information 👍

dwsupplee commented 6 years ago

Happy to help :). I'm going to close this out for now, as it seems the original topic has been covered.