langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.37k stars 15.25k forks source link

`deeplake` adds significantly more dependencies in default installation #1396

Closed zhengligs closed 1 year ago

zhengligs commented 1 year ago

I noticed installing langchain using pip install langchain adds many more packages recently.

Here is the dependency map shown by johnnydep:

name                                             summary
-----------------------------------------------  -------------------------------------------------------------------------------------------------------
langchain                                        Building applications with LLMs through composability
├── PyYAML<7,>=6                                 YAML parser and emitter for Python
├── SQLAlchemy<2,>=1                             Database Abstraction Library
│   └── greenlet!=0.4.17                         Lightweight in-process concurrent programming
├── aiohttp<4.0.0,>=3.8.3                        Async http client/server framework (asyncio)
│   ├── aiosignal>=1.1.2                         aiosignal: a list of registered asynchronous callbacks
│   │   └── frozenlist>=1.1.0                    A list-like structure which implements collections.abc.MutableSequence
│   ├── async-timeout<5.0,>=4.0.0a3              Timeout context manager for asyncio programs
│   ├── attrs>=17.3.0                            Classes Without Boilerplate
│   ├── charset-normalizer<4.0,>=2.0             The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│   ├── frozenlist>=1.1.1                        A list-like structure which implements collections.abc.MutableSequence
│   ├── multidict<7.0,>=4.5                      multidict implementation
│   └── yarl<2.0,>=1.0                           Yet another URL library
│       ├── idna>=2.0                            Internationalized Domain Names in Applications (IDNA)
│       └── multidict>=4.0                       multidict implementation
├── aleph-alpha-client<3.0.0,>=2.15.0            python client to interact with Aleph Alpha api endpoints
│   ├── aiodns>=3.0.0                            Simple DNS resolver for asyncio
│   │   └── pycares>=4.0.0                       Python interface for c-ares
│   │       └── cffi>=1.5.0                      Foreign Function Interface for Python calling C code.
│   │           └── pycparser                    C parser in Python
│   ├── aiohttp-retry>=2.8.3                     Simple retry client for aiohttp
│   │   └── aiohttp                              Async http client/server framework (asyncio)
│   │       ├── aiosignal>=1.1.2                 aiosignal: a list of registered asynchronous callbacks
│   │       │   └── frozenlist>=1.1.0            A list-like structure which implements collections.abc.MutableSequence
│   │       ├── async-timeout<5.0,>=4.0.0a3      Timeout context manager for asyncio programs
│   │       ├── attrs>=17.3.0                    Classes Without Boilerplate
│   │       ├── charset-normalizer<4.0,>=2.0     The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│   │       ├── frozenlist>=1.1.1                A list-like structure which implements collections.abc.MutableSequence
│   │       ├── multidict<7.0,>=4.5              multidict implementation
│   │       └── yarl<2.0,>=1.0                   Yet another URL library
│   │           ├── idna>=2.0                    Internationalized Domain Names in Applications (IDNA)
│   │           └── multidict>=4.0               multidict implementation
│   ├── aiohttp>=3.8.3                           Async http client/server framework (asyncio)
│   │   ├── aiosignal>=1.1.2                     aiosignal: a list of registered asynchronous callbacks
│   │   │   └── frozenlist>=1.1.0                A list-like structure which implements collections.abc.MutableSequence
│   │   ├── async-timeout<5.0,>=4.0.0a3          Timeout context manager for asyncio programs
│   │   ├── attrs>=17.3.0                        Classes Without Boilerplate
│   │   ├── charset-normalizer<4.0,>=2.0         The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│   │   ├── frozenlist>=1.1.1                    A list-like structure which implements collections.abc.MutableSequence
│   │   ├── multidict<7.0,>=4.5                  multidict implementation
│   │   └── yarl<2.0,>=1.0                       Yet another URL library
│   │       ├── idna>=2.0                        Internationalized Domain Names in Applications (IDNA)
│   │       └── multidict>=4.0                   multidict implementation
│   ├── requests>=2.28                           Python HTTP for Humans.
│   │   ├── certifi>=2017.4.17                   Python package for providing Mozilla's CA Bundle.
│   │   ├── charset-normalizer<4,>=2             The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│   │   ├── idna<4,>=2.5                         Internationalized Domain Names in Applications (IDNA)
│   │   └── urllib3<1.27,>=1.21.1                HTTP library with thread-safe connection pooling, file post, and more.
│   ├── tokenizers>=0.13.2                       Fast and Customizable Tokenizers
│   └── urllib3>=1.26                            HTTP library with thread-safe connection pooling, file post, and more.
├── dataclasses-json<0.6.0,>=0.5.7               Easily serialize dataclasses to and from JSON
│   ├── marshmallow-enum<2.0.0,>=1.5.1           Enum field for Marshmallow
│   │   └── marshmallow>=2.0.0                   A lightweight library for converting complex datatypes to and from native Python datatypes.
│   │       └── packaging>=17.0                  Core utilities for Python packages
│   ├── marshmallow<4.0.0,>=3.3.0                A lightweight library for converting complex datatypes to and from native Python datatypes.
│   │   └── packaging>=17.0                      Core utilities for Python packages
│   └── typing-inspect>=0.4.0                    Runtime inspection utilities for typing module.
│       ├── mypy-extensions>=0.3.0               Type system extensions for programs checked with the mypy type checker.
│       └── typing-extensions>=3.7.4             Backported and Experimental Type Hints for Python 3.7+
├── deeplake<4.0.0,>=3.2.9                       Activeloop Deep Lake
│   ├── boto3                                    The AWS SDK for Python
│   │   ├── botocore<1.30.0,>=1.29.82            Low-level, data-driven core of boto 3.
│   │   │   ├── jmespath<2.0.0,>=0.7.1           JSON Matching Expressions
│   │   │   ├── python-dateutil<3.0.0,>=2.1      Extensions to the standard Python datetime module
│   │   │   │   └── six>=1.5                     Python 2 and 3 compatibility utilities
│   │   │   └── urllib3<1.27,>=1.25.4            HTTP library with thread-safe connection pooling, file post, and more.
│   │   ├── jmespath<2.0.0,>=0.7.1               JSON Matching Expressions
│   │   └── s3transfer<0.7.0,>=0.6.0             An Amazon S3 Transfer Manager
│   │       └── botocore<2.0a.0,>=1.12.36        Low-level, data-driven core of boto 3.
│   │           ├── jmespath<2.0.0,>=0.7.1       JSON Matching Expressions
│   │           ├── python-dateutil<3.0.0,>=2.1  Extensions to the standard Python datetime module
│   │           │   └── six>=1.5                 Python 2 and 3 compatibility utilities
│   │           └── urllib3<1.27,>=1.25.4        HTTP library with thread-safe connection pooling, file post, and more.
│   ├── click                                    Composable command line interface toolkit
│   ├── hub>=2.8.7                               Activeloop Deep Lake
│   │   └── deeplake                             Activeloop Deep Lake
│   │       └── ...                              ... <circular dependency marker for deeplake -> hub -> deeplake>
│   ├── humbug>=0.2.6                            Humbug: Do you build developer tools? Humbug helps you know your users.
│   │   └── requests                             Python HTTP for Humans.
│   │       ├── certifi>=2017.4.17               Python package for providing Mozilla's CA Bundle.
│   │       ├── charset-normalizer<4,>=2         The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│   │       ├── idna<4,>=2.5                     Internationalized Domain Names in Applications (IDNA)
│   │       └── urllib3<1.27,>=1.21.1            HTTP library with thread-safe connection pooling, file post, and more.
│   ├── numcodecs                                A Python package providing buffer compression and transformation codecs for use
│   │   ├── entrypoints                          Discover and load entry points from installed packages.
│   │   └── numpy>=1.7                           Fundamental package for array computing in Python
│   ├── numpy                                    Fundamental package for array computing in Python
│   ├── pathos                                   parallel graph management and execution in heterogeneous computing
│   │   ├── dill>=0.3.6                          serialize all of python
│   │   ├── multiprocess>=0.70.14                better multiprocessing and multithreading in python
│   │   │   └── dill>=0.3.6                      serialize all of python
│   │   ├── pox>=0.3.2                           utilities for filesystem exploration and automated builds
│   │   └── ppft>=1.7.6.6                        distributed and parallel python
│   ├── pillow                                   Python Imaging Library (Fork)
│   ├── pyjwt                                    JSON Web Token implementation in Python
│   └── tqdm                                     Fast, Extensible Progress Meter
├── numpy<2,>=1                                  Fundamental package for array computing in Python
├── pydantic<2,>=1                               Data validation and settings management using python type hints
│   └── typing-extensions>=4.2.0                 Backported and Experimental Type Hints for Python 3.7+
├── requests<3,>=2                               Python HTTP for Humans.
│   ├── certifi>=2017.4.17                       Python package for providing Mozilla's CA Bundle.
│   ├── charset-normalizer<4,>=2                 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│   ├── idna<4,>=2.5                             Internationalized Domain Names in Applications (IDNA)
│   └── urllib3<1.27,>=1.21.1                    HTTP library with thread-safe connection pooling, file post, and more.
└── tenacity<9.0.0,>=8.1.0                       Retry code until it succeeds

deeplake brings in many packages, although it's marked as optional in pyproject.toml.

blob42 commented 1 year ago

+1 I don't think everything should be included by default.

torque59 commented 1 year ago

@hwchase17 FYI , deeplake also brings in cyclic dependency, which i have raised here https://github.com/activeloopai/deeplake/issues/2220

miraculixx commented 1 year ago

+1 this should be documented somewhere, in particular since deeplake pulls in humbug for automatic usability tracking (by default presuming user content). That is by installing langchain[all] one automatically participates in deeplake usage tracking, even if you don't actively use it. https://github.com/activeloopai/deeplake/issues/1754

mikayelh commented 1 year ago

@miraculixx I really don't think this is the case because the reporting works only after you use deeplake. Looping in @istranic to confirm.

miraculixx commented 1 year ago

@mikayelh Unfortunately yes, see below. In a nutshell, after pip install langchain[all] it is enough to import langchain and all uncaught(?) subsequent exceptions will trigger HumbugReport.publish()

Langchain attempts to import all supported vectorstores, including deeplake. If it is installed, it will import deeplake.

upon import deeplake, a HumbugReporter is set up and an exception hook added. That is any future exception triggers a reporter.publish() call to https://spire.bugout.dev.

istranic commented 1 year ago

Hi @miraculixx Thx for digging into this. Yesterday we disabled reporting upon importing deeplake for an unrelated reason. We'll get rid of the exception hood, and that will eliminate all reporting that happens by virtue of only importing deeplake.

miraculixx commented 1 year ago

@istranic Great news, much appreciated!

istranic commented 1 year ago

Hi @miraculixx This PR was merged.

dosubot[bot] commented 1 year ago

Hi, @zhengligs! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, the issue you raised was about the deeplake package having a dependency issue where it adds a significant number of dependencies when installed, despite being marked as optional. The maintainers have acknowledged this issue and have made changes to disable reporting and eliminate the exception hook that triggers reporting upon importing deeplake. These changes have been merged and should resolve the problem.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository!