langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
89.35k stars 14.08k forks source link

No possibility to define WhisperModel compute_type when using GenericLoader with blob_parser=FasterWhisperParser #23953

Open AleksNeStu opened 1 week ago

AleksNeStu commented 1 week ago

Checked other resources

Example Code

import torch
from langchain_community.document_loaders import YoutubeAudioLoader
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers.audio import (
    FasterWhisperParser
)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
# float32
compute_type = "float16" if device == 'cuda' else 'int8'

yt_video_url = 'https://www.youtube.com/watch?v=1bUy-1hGZpI&ab_channel=IBMTechnology'
yt_loader_faster_whisper = GenericLoader(
    blob_loader=YoutubeAudioLoader([ yt_video_url], '.'),
    blob_parser=FasterWhisperParser(device=device)
    # no possibility to define compute_type
    # Error: ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
    # blob_parser=FasterWhisperParser(device=device, compute_type=compute_type)
)
yt_data = yt_loader_faster_whisper.load()

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
  File "python/helpers/pydev/pydevd.py", line 1551, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "AI-POC/frameworks/langchain/01_chat_with_data/main.py", line 133, in <module>
    docs_load()
  File "AI-POC/frameworks/langchain/01_chat_with_data/main.py", line 123, in docs_load
    get_youtube(use_paid_services=False, faster_whisper=True, wisper_local=False)
  File "AI-POC/frameworks/langchain/01_chat_with_data/main.py", line 108, in get_youtube
    yt_data = yt_loader_faster_whisper.load()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "AI-POC/.venv/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 29, in load
    return list(self.lazy_load())
           ^^^^^^^^^^^^^^^^^^^^^^
  File "AI-POC/.venv/lib/python3.11/site-packages/langchain_community/document_loaders/generic.py", line 116, in lazy_load
    yield from self.blob_parser.lazy_parse(blob)
  File "AI-POC/.venv/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/audio.py", line 467, in lazy_parse
    model = WhisperModel(
            ^^^^^^^^^^^^^
  File "AI-POC/.venv/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 145, in __init__
    self.model = ctranslate2.models.Whisper(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.

Description

I'm trying to use the FasterWhisperParser class from the langchain_community package to parse audio data. I want to be able to use a GPU if one is available, and fall back to a CPU otherwise.

I'm trying to set the compute_type to 'float16' when using a GPU and 'int8' when using a CPU. However, I'm encountering an issue because the FasterWhisperParser class doesn't accept a compute_type argument. When I try to use a CPU, I get a ValueError because 'float16' computation isn't efficiently supported on CPUs.

System Info

$ python -m langchain_core.sys_info

System Information
------------------
> OS:  Linux
> OS Version:  #1 SMP PREEMPT_DYNAMIC Thu May 11 15:56:33 UTC 2023
> Python Version:  3.11.6 (main, Oct  3 2023, 00:00:00) [GCC 12.3.1 20230508 (Red Hat 12.3.1-1)]

Package Information
-------------------
> langchain_core: 0.2.11
> langchain: 0.2.6
> langchain_community: 0.2.6
> langsmith: 0.1.83
> langchain_text_splitters: 0.2.2

Packages not installed (Not Necessarily a Problem)
--------------------------------------------------
The following packages were not found:

> langgraph
> langserve
AleksNeStu commented 1 week ago

Root cause:

https://github.com/langchain-ai/langchain/blob/ee579c77c1691bdf6b39aef649e1570516917e28/libs/community/langchain_community/document_loaders/parsers/audio.py#L468