wit-ai / pywit

Python library for Wit.ai
Other
1.45k stars 359 forks source link

Unsupported content-type #145

Closed Prateek13727 closed 4 years ago

Prateek13727 commented 4 years ago

hey fellas,

Thank-you for creating Wit :) It's helping me get started with my first NLP project.

the use-case I am reading a wave file and sending it across using wit speech API, as shown in the code at the bottom.

I was initially getting 400 Bad Request, hence I cloned the repo and performed the steps mentioned #126 to get a more specific error message (as shown below)

The error trace

Traceback (most recent call last):
  File "wit_speech.py", line 23, in <module>
    text =  RecognizeSpeech('recordings/mav_abs_1.wav', 4)
  File "wit_speech.py", line 12, in RecognizeSpeech
    resp = client.speech(f, headers={'Content-Type':'audio/wav'})
  File "/home/maverick/maverick/myGit/orador/pywit/wit/wit.py", line 90, in speech
    data=audio_file, headers=headers)
  File "/home/maverick/maverick/myGit/orador/pywit/wit/wit.py", line 46, in req
    raise WitError('Wit responded with an error: ' + json['error'])
wit.wit.WitError: Wit responded with an error: Unsupported content-type

Am I missing something here?

The code

import requests
  import json
  from wit import Wit

  wit_access_token = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

  def RecognizeSpeech(AUDIO_FILENAME, num_seconds = 5):

      client = Wit(wit_access_token)
      with open(AUDIO_FILENAME, 'rb') as f:
          resp = client.speech(f, {'Content-Type': 'audio/wav'})

      return resp

  if __name__ == "__main__":
      text =  RecognizeSpeech('recordings/mav_abs_1.wav', 4)
      print("\nYou said: {}".format(text))

App-id: 1531619810342334

Any helps would be appreciated. Please let me know in case of any clarifications needed.

cheers Prateek

patapizza commented 4 years ago

Hi @Prateek13727,

How is the WAV file encoded? What is the output of sox --i recordings/mav_abs_1.wav?

Prateek13727 commented 4 years ago

hey @patapizza

the output of sox --i recordings/mav_abs_1.wav is

Input File : 'mav_abs_1.wav' Channels : 2 Sample Rate : 44100 Precision : 16-bit Duration : 00:01:07.82 = 2990784 samples = 5086.37 CDDA sectors File Size : 12.0M Bit Rate : 1.41M Sample Encoding: 16-bit Signed Integer PCM

patapizza commented 4 years ago

From the API docs:

At this time, Wit.ai is only able to process mono so you must make sure to send mono and not stereo to the API.

Also make sure you provide the right headers for sample rates, precision, etc.

Hope this helps.

Prateek13727 commented 4 years ago

Thanks, @patapizza for your reply. I will try with mono audio and get back

The wave file created with the below command (from the APU docs) works fine with the API. sox -d -b 16 -c 1 -r 16k sample.wav

Prateek13727 commented 4 years ago

Hey @patapizza

Had a small doubt wrt Wit Speech use case. I intend to use the wit.ai speech API for pure transcription. I am looking for a response something like as shown below (I require the timestamps of individual words). This is like level-1 basic transcription without training. Is this possible with current wit speech API? If I understand correctly right now I get timestamps only with the entities that are extracted from the speech after training.

Cheers Prateek

[ "items": [ { "start_time": "2.23", "end_time": "2.78", "alternatives": [ { "confidence": "0.9582", "content": "morning" } ], "type": "pronunciation" }, { "alternatives": [ { "confidence": "0.0", "content": "." } ], "type": "punctuation" }, { "start_time": "2.79", "end_time": "2.91", "alternatives": [ { "confidence": "0.861", "content": "Who" } ], "type": "pronunciation" }, { "start_time": "2.91", "end_time": "3.04", "alternatives": [ { "confidence": "0.8081", "content": "would" } ], "type": "pronunciation" }, { "alternatives": [ { "confidence": "0.0", "content": "?" } ], "type": "punctuation" }, ]