langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.25k stars 14.72k forks source link

BlobLoader.yield_blobs returns Iterable, it should be Iterator like in BaseLoader #25718

Open gabayben opened 2 weeks ago

gabayben commented 2 weeks ago

Checked other resources

Example Code

class BlobLoader(ABC): """Abstract interface for blob loaders implementation.

Implementer should be able to load raw content from a storage system according
to some criteria and return the raw content lazily as a stream of blobs.
"""

@abstractmethod
def yield_blobs(
    self,
) -> Iterable[Blob]:
    """A lazy loader for raw data represented by LangChain's Blob object.

    Returns:
        A generator over blobs
    """

Error Message and Stack Trace (if applicable)

No response

Description

BlobLoader.yield_blobs returns Iterable, it should be Iterator like in BaseLoader

System Info

BlobLoader.yield_blobs returns Iterable, it should be Iterator like in BaseLoader

gbaian10 commented 2 weeks ago

I tried it, and it does indeed return an Iterable object instead of an Iterator.

from collections.abc import Iterable, Iterator

from langchain_community.document_loaders.blob_loaders.youtube_audio import (
    YoutubeAudioLoader,
)
from langchain_core.documents.base import Blob

urls = ["https://www.youtube.com/watch?v=xxxxx"]
save_dir = r"your_save_dir"
loader = YoutubeAudioLoader(urls, save_dir)
for x in loader.yield_blobs():
    print(isinstance(x, Iterator))  # False
    print(isinstance(x, Iterable))  # True
    print(isinstance(x, Blob))  # True
    for i in x:
        print(i)  # ok
    next(x)  # TypeError: 'Blob' object is not an iterator
gbaian10 commented 2 weeks ago

I see. I mistakenly thought Blob needed to be an Iterator. It is indeed an Iterator because it's a generator.

image

If it's explicitly a generator, I'd prefer it to be written as Generator[Blob, None, None].

However, it seems that LangChain often writes generators as Iterator[...]?