microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
32.32k stars 4.71k forks source link

Azure Whisper AI is not working #1916

Closed Roopesh-Bharatwaj-K-R closed 7 months ago

Roopesh-Bharatwaj-K-R commented 7 months ago

Describe the issue

Hi, We are facing an issue related to the Azure whisper AI for the task of Transcription Service.

We followed the base code from[(https://github.com/microsoft/autogen/blob/main/notebook/agentchat_video_transcript_translate_with_whisper.ipynb) ]

the only change was the Azure Whisper Integration with specific APIs and Keys.

Unfortunately, it leads us to errors and Generates Scripts [Speech Recognition] instead of Transcription services

Below I have given the Brief info of the Error Message.

Kindly let me know how to overcome this issue. Thanks in Advance

Roopesh

Steps to reproduce

Followed All the Same from the Given Example of the

https://github.com/microsoft/autogen/blob/main/notebook/agentchat_video_transcript_translate_with_whisper.ipynb

Created a new environment and installed

%pip install moviepy~=1.0.3

%pip install openai-whisper

%pip install openai~=1.3.5

%pip install "pyautogen>=0.2.3"

Screenshots and logs

Below is the Error Message:


    `/Users/roopeshbharatwajkr/anaconda3/envs/Autogen/bin/python 
    /Users/roopeshbharatwajkr/PycharmProjects/Autogen/autogen/EDIH_Autogen/autogent_test.py 
    user_proxy (to chatbot):

    For the video located in /Users/roopeshbharatwajkr/Downloads/Jurgen.mp4, recognize the speech and transfer it into a 
    script file, then translate from english text to a english video subtitle text, and transfer it into a script file called 
     transcribed.txt. 

    --------------------------------------------------------------------------------
    chatbot (to user_proxy):

    I'm sorry, but I do not have access to your local machine to execute the task. Can you please provide me with a function 
    that performs the required tasks and the relevant input files?

    --------------------------------------------------------------------------------
    user_proxy (to chatbot):
   --------------------------------------------------------------------------------
   chatbot (to user_proxy):

    Unfortunately, I cannot execute any code on your local machine. However, assuming that you have installed the 
   necessary libraries and have the video file saved in the specified path, you can use the following code to achieve the 
    required tasks:

          ```python
          import speech_recognition as sr
          from googletrans import Translator

          def recognize_transcript_from_video(audio_filepath):
              r = sr.Recognizer()
              with sr.AudioFile(audio_filepath) as audio_file:
                  audio_data = r.record(audio_file)
                  transcript = r.recognize_google(audio_data)
              return transcript

          def translate_transcript(source_language, target_language, transcript):
              translator = Translator()
              translated_text = translator.translate(transcript, src=source_language, dest=target_language).text
              return translated_text

          video_path = '/Users/roopeshbharatwajkr/Downloads/Jurgen.mp4'
          transcript = recognize_transcript_from_video(video_path)
          translated_text = translate_transcript('en', 'en', transcript)

          with open('transcribed.txt', 'w') as f:
              f.write(translated_text)
      Note that this code uses the SpeechRecognition library to transcribe the audio from the video, and the googletrans library to translate the transcript. If you have not installed these libraries, you can do so using pip:

      ```python
      pip install SpeechRecognition googletrans
      ```

      --------------------------------------------------------------------------------

      >>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
      user_proxy (to chatbot):

      exitcode: 1 (execution failed)
      Code output: 
      Traceback (most recent call last):
        File "", line 2, in <module>
          from googletrans import Translator
        File "/Users/roopeshbharatwajkr/anaconda3/envs/Autogen/lib/python3.10/site-packages/googletrans/__init__.py", line 6, in <module>
          from googletrans.client import Translator
        File "/Users/roopeshbharatwajkr/anaconda3/envs/Autogen/lib/python3.10/site-packages/googletrans/client.py", line 30, in <module>
          class Translator:
        File "/Users/roopeshbharatwajkr/anaconda3/envs/Autogen/lib/python3.10/site-packages/googletrans/client.py", line 62, in Translator
          proxies: typing.Dict[str, httpcore.SyncHTTPTransport] = None,
      AttributeError: module 'httpcore' has no attribute 'SyncHTTPTransport'

      --------------------------------------------------------------------------------
      chatbot (to user_proxy):

      It seems that the issue is with the version of the `httpcore` library that you are using. The `SyncHTTPTransport` class is only available in `httpcore` version 0.14.0 and above.

      You can try upgrading the `httpcore` library to a more recent version by running:

      ```python
      pip install --upgrade httpcore
      ```

      If this does not resolve the issue, you can try using a different translation library such as `translate` or `py-googletrans`. Here is a modified version of the code that uses the `translate` library instead:

      ```python
      import speech_recognition as sr
      from translate import Translator

      def recognize_transcript_from_video(audio_filepath):
          r = sr.Recognizer()
          with sr.AudioFile(audio_filepath) as audio_file:
              audio_data = r.record(audio_file)
              transcript = r.recognize_google(audio_data)
          return transcript

      def translate_transcript(source_language, target_language, transcript):
          translator = Translator(to_lang=target_language)
          translated_text = translator.translate(transcript)
          return translated_text

      video_path = '/Users/roopeshbharatwajkr/Downloads/Jurgen.mp4'
      transcript = recognize_transcript_from_video(video_path)
      translated_text = translate_transcript('en', 'en', transcript)

      with open('transcribed.txt', 'w') as f:
          f.write(translated_text)
      ```

      This code uses the `translate` library to translate the transcript. If you have not installed the library, you can do so using pip:

      ```python
      pip install translate
      ```

      --------------------------------------------------------------------------------

      >>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
      user_proxy (to chatbot):

      exitcode: 1 (execution failed)
      Code output: 
        File "", line 1
          pip install --upgrade httpcore
              ^^^^^^^
      SyntaxError: invalid syntax

      --------------------------------------------------------------------------------
      chatbot (to user_proxy):

      I apologize for the confusion. The command `pip install --upgrade httpcore` should be run in your terminal or command prompt, not in the Python interpreter. 

      First, exit from the current Python session by typing `exit()` in the interpreter and pressing Enter. Then, open a new terminal or command prompt and run the command `pip install --upgrade httpcore` before running the previous code.

      Once you have upgraded the `httpcore` library, you can run the previous code to transcribe and translate the video.

      --------------------------------------------------------------------------------
      user_proxy (to chatbot):

      --------------------------------------------------------------------------------
      chatbot (to user_proxy):

      I see that the previous response was cut off. Can you please provide me with the full message/error output? Also, please let me know if you have upgraded the `httpcore` library before running the previous code.

      --------------------------------------------------------------------------------
      user_proxy (to chatbot):

      --------------------------------------------------------------------------------
      chatbot (to user_proxy):

      I see. It seems that you are still encountering issues with the `httpcore` library even after upgrading it.

      In this case, you can try using the `py-googletrans` library as an alternative. Here's an updated version of the code that uses `py-googletrans` to translate the transcript:

      ```python
      import speech_recognition as sr
      from googletrans import LANGUAGES
      from pygoogletranslation import Translator

      def recognize_transcript_from_video(audio_filepath):
          r = sr.Recognizer()
          with sr.AudioFile(audio_filepath) as audio_file:
              audio_data = r.record(audio_file)
              transcript = r.recognize_google(audio_data)
          return transcript

      def translate_transcript(source_language, target_language, transcript):
          translator = Translator(source=source_language, target=target_language)
          translated_text = translator.translate(transcript)
          return translated_text

      video_path = '/Users/roopeshbharatwajkr/Downloads/Jurgen.mp4'
      transcript = recognize_transcript_from_video(video_path)
      translated_text = translate_transcript('en', 'en', transcript)

      with open('transcribed.txt', 'w') as f:
          f.write(translated_text)
      ```

      Note that this code uses `py-googletrans` instead of `googletrans` to perform the translation. You can install `py-googletrans` with pip:

      ```python
      pip install pygoogletranslation
      ```

      Please let me know if you still encounter any issues with this code.

      --------------------------------------------------------------------------------

      Process finished with exit code 0
      `


### Additional Information

Python : 3.10.13
OS: Mac 
pyautogen: 0.2.3
jackgerrits commented 7 months ago

What is the contents of the file you're running: autogent_test.py?

Roopesh-Bharatwaj-K-R commented 7 months ago

https://github.com/microsoft/autogen/blob/main/notebook/agentchat_video_transcript_translate_with_whisper.ipynb. This example is the base content. No code changes other than Azure Whisper AI, API and keys.

jackgerrits commented 7 months ago

Okay, just ran through it. I think the issue might have been that it was using the old function registration syntax so perhaps it didn't know it could use the functions. I have a PR incoming updating the notebook to fix this.