zenml-io / zenml

ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
https://zenml.io
Apache License 2.0
3.94k stars 430 forks source link

[BUG]: Error in the Materializer for integration with Langchain >= 0.0.325 #2012

Closed leoregino closed 10 months ago

leoregino commented 10 months ago

Contact Details [Optional]

No response

System Information

ZENML_LOCAL_VERSION: 0.42.1 ZENML_SERVER_VERSION: 0.42.1 ZENML_SERVER_DATABASE: mysql ZENML_SERVER_DEPLOYMENT_TYPE: gcp ZENML_CONFIG_DIR: xxxx ZENML_LOCAL_STORE_DIR: xxxx ZENML_SERVER_URL: (I'm a CAB client) ZENML_ACTIVE_REPOSITORY_ROOT: xxxx PYTHON_VERSION: 3.10.9 ENVIRONMENT: native SYSTEM_INFO: {'os': 'mac', 'mac_version': '13.5.2'} ACTIVE_WORKSPACE: default ACTIVE_STACK: local ACTIVE_USER: xxxx TELEMETRY_STATUS: enabled ANALYTICS_CLIENT_ID: xxxx ANALYTICS_USER_ID: xxxx ANALYTICS_SERVER_ID: xxxx INTEGRATIONS: ['gcp', 'kaniko', 'kubeflow', 'mlflow', 'openai', 'pillow', 'pytorch', 'scipy', 'sklearn', 'slack', 'langchain'] PACKAGES: {'regex': '2023.10.3', 'gcsfs': '2023.10.0', 'fsspec': '2023.10.0', 'certifi': '2023.7.22', 'jsonschema-specifications': '2023.7.1', 'pytz': '2022.7.1', 'setuptools': '65.5.0', 'cryptography': '41.0.5', 'kubernetes': '25.3.0', 'pyzmq': '25.1.1', 'gevent': '23.9.1', 'aiofiles': '23.2.1', 'packaging': '23.2', 'attrs': '23.1.0', 'argon2-cffi': '23.1.0', 'azure-mgmt-resource': '23.0.1', 'pip': '22.3.1', 'argon2-cffi-bindings': '21.2.0', 'isoduration': '20.11.0', 'gunicorn': '20.1.0', 'rich': '12.6.0', 'pyarrow': '11.0.0', 'pillow': '10.1.0', 'ipython': '8.17.2', 'jupyter-client': '8.5.0', 'tenacity': '8.2.3', 'click': '8.1.3', 'nbconvert': '7.10.0', 'ipywidgets': '7.8.1', 'overrides': '7.4.0', 'notebook': '7.0.6', 'ipykernel': '6.26.0', 'importlib-metadata': '6.8.0', 'tornado': '6.3.3', 'docker': '6.1.3', 'zope.interface': '6.1', 'bleach': '6.1.0', 'importlib-resources': '6.1.0', 'multidict': '6.0.4', 'pyyaml': '6.0.1', 'traitlets': '5.13.0', 'psutil': '5.9.6', 'nbformat': '5.9.2', 'jupyter-core': '5.5.0', 'cachetools': '5.3.2', 'decorator': '5.1.1', 'smmap': '5.0.1', 'zope.event': '5.0', 'tqdm': '4.66.1', 'fonttools': '4.43.1', 'transformers': '4.30.1', 'jsonschema': '4.19.2', 'beautifulsoup4': '4.12.2', 'rsa': '4.9', 'typing-extensions': '4.8.0', 'pexpect': '4.8.0', 'gitdb': '4.0.11', 'jupyterlab': '4.0.7', 'async-timeout': '4.0.3', 'bcrypt': '4.0.1', 'slack-sdk': '3.23.0', 'google-cloud-build': '3.21.0', 'protobuf': '3.20.3', 'marshmallow': '3.20.1', 'zipp': '3.17.0', 'filelock': '3.13.1', 'platformdirs': '3.11.0', 'orjson': '3.9.10', 'aiohttp': '3.8.6', 'matplotlib': '3.8.1', 'nltk': '3.8.1', 'anyio': '3.7.1', 'widgetsnbextension': '3.6.6', 'google-cloud-bigquery': '3.6.0', 'markdown': '3.5.1', 'google-cloud-logging': '3.5.0', 'asyncio': '3.4.3', 'idna': '3.4', 'charset-normalizer': '3.3.2', 'oauthlib': '3.2.2', 'networkx': '3.2.1', 'threadpoolctl': '3.2.0', 'gitpython': '3.1.40', 'jinja2': '3.1.2', 'prompt-toolkit': '3.0.39', 'mistune': '3.0.2', 'greenlet': '3.0.1', 'werkzeug': '3.0.1', 'uritemplate': '3.0.1', 'tritonclient': '2.36.0', 'google-cloud-container': '2.33.0', 'requests': '2.31.0', 'jupyterlab-server': '2.25.0', 'google-auth': '2.23.4', 'google-cloud-bigquery-storage': '2.22.0', 'pycparser': '2.21', 'fastjsonschema': '2.18.1', 'google-cloud-secret-manager': '2.16.4', 'pygments': '2.16.1', 'babel': '2.13.1', 'google-cloud-storage': '2.13.0', 'google-api-core': '2.12.0', 'google-cloud-scheduler': '2.11.2', 'jupyter-server': '2.9.1', 'types-python-dateutil': '2.8.19.14', 'python-dateutil': '2.8.2', 'pyjwt': '2.8.0', 'google-resumable-media': '2.6.0', 'soupsieve': '2.5', 'pyparsing': '2.4.7', 'asttokens': '2.4.1', 'jsonpointer': '2.4', 'flask': '2.3.3', 'google-cloud-core': '2.3.3', 'termcolor': '2.3.0', 'sentence-transformers': '2.2.2', 'mlflow': '2.2.2', 'cloudpickle': '2.2.1', 'jupyter-lsp': '2.2.0', 'markupsafe': '2.1.3', 'itsdangerous': '2.1.2', 'torch': '2.1.0', 'python-json-logger': '2.0.7', 'async-lru': '2.0.4', 'geventhttpclient': '2.0.2', 'kafka-python': '2.0.2', 'tomli': '2.0.1', 'executing': '2.0.1', 'googleapis-common-protos': '1.61.0', 'grpcio': '1.59.2', 'grpcio-status': '1.48.2', 'jsonpatch': '1.33', 'azure-core': '1.29.5', 'urllib3': '1.26.18', 'google-cloud-aiplatform': '1.26.0', 'numpy': '1.22.4', 'proto-plus': '1.22.3', 'six': '1.16.0', 'cffi': '1.16.0', 'wrapt': '1.15.0', 'google-cloud-functions': '1.13.3', 'python-rapidjson': '1.13', 'webcolors': '1.13', 'google-api-python-client': '1.12.11', 'sympy': '1.12', 'scipy': '1.11.3', 'pydantic': '1.10.13', 'google-cloud-resource-manager': '1.10.4', 'backoff': '1.10.0', 'yarl': '1.9.2', 'kfp': '1.8.22', 'shapely': '1.8.5.post1', 'kfp-server-api': '1.8.5', 'pydata-google-auth': '1.8.2', 'send2trash': '1.8.2', 'alembic': '1.8.1', 'distro': '1.8.0', 'debugpy': '1.8.0', 'passlib': '1.7.4', 'blinker': '1.7.0', 'websocket-client': '1.6.4', 'monotonic': '1.6', 'nest-asyncio': '1.5.8', 'pandas': '1.5.3', 'fqdn': '1.5.1', 'google-crc32c': '1.5.0', 'pandocfilters': '1.5.0', 'sqlalchemy': '1.4.50', 'kiwisolver': '1.4.5', 'analytics-python': '1.4.post1', 'absl-py': '1.4.0', 'frozenlist': '1.4.0', 'azure-mgmt-core': '1.4.0', 'mlserver': '1.3.5', 'mlserver-mlflow': '1.3.5', 'joblib': '1.3.2', 'google-cloud-appengine-logging': '1.3.2', 'aiosignal': '1.3.1', 'requests-oauthlib': '1.3.1', 'arrow': '1.3.0', 'uri-template': '1.3.0', 'mpmath': '1.3.0', 'sniffio': '1.3.0', 'deprecated': '1.2.14', 'querystring-parser': '1.2.4', 'mako': '1.2.4', 'cloud-sql-python-connector': '1.2.3', 'scikit-learn': '1.2.2', 'tinycss2': '1.2.1', 'azure-common': '1.1.28', 'jupyterlab-widgets': '1.1.7', 'exceptiongroup': '1.1.3', 'contourpy': '1.1.1', 'db-dtypes': '1.1.1', 'google-auth-oauthlib': '1.1.0', 'brotli': '1.1.0', 'pymysql': '1.0.3', 'mypy-extensions': '1.0.0', 'python-dotenv': '1.0.0', 'fastapi': '0.89.1', 'numba': '0.58.1', 'shap': '0.43.0', 'zenml': '0.42.1', 'wheel': '0.41.3', 'llvmlite': '0.41.1', 'sqlalchemy-utils': '0.38.3', 'referencing': '0.30.2', 'openai': '0.27.9', 'asyncpg': '0.27.0', 'uvicorn': '0.23.2', 'starlette': '0.22.0', 'pandas-gbq': '0.19.1', 'jedi': '0.19.1', 'httplib2': '0.19.1', 'uvloop': '0.19.0', 'validators': '0.18.2', 'databricks-cli': '0.18.0', 'prometheus-client': '0.18.0', 'huggingface-hub': '0.18.0', 'terminado': '0.17.1', 'starlette-exporter': '0.16.0', 'torchvision': '0.16.0', 'docstring-parser': '0.15', 'h11': '0.14.0', 'tokenizers': '0.13.3', 'grpc-google-iam-v1': '0.12.6', 'cycler': '0.12.1', 'rpds-py': '0.10.6', 'python-terraform': '0.10.1', 'requests-toolbelt': '0.10.1', 'json5': '0.9.14', 'commonmark': '0.9.1', 'typer': '0.9.0', 'typing-inspect': '0.9.0', 'tabulate': '0.9.0', 'parso': '0.8.3', 'aiokafka': '0.8.1', 'jupyter-events': '0.8.0', 'nbclient': '0.8.0', 'defusedxml': '0.7.1', 'ptyprocess': '0.7.0', 'py-grpc-prometheus': '0.7.0', 'stack-data': '0.6.3', 'dataclasses-json': '0.6.1', 'isodate': '0.6.1', 'webencodings': '0.5.1', 'fire': '0.5.0', 'pyasn1': '0.5.0', 'sqlparse': '0.4.4', 'jupyter-server-terminals': '0.4.4', 'entrypoints': '0.4', 'safetensors': '0.4.0', 'pyasn1-modules': '0.3.0', 'click-params': '0.3.0', 'wcwidth': '0.2.9', 'google-cloud-audit-log': '0.2.5', 'notebook-shim': '0.2.3', 'pure-eval': '0.2.2', 'jupyterlab-pygments': '0.2.2', 'ipython-genutils': '0.2.0', 'sentencepiece': '0.1.99', 'kfp-pipeline-spec': '0.1.16', 'strip-hints': '0.1.10', 'pgvector': '0.1.8', 'matplotlib-inline': '0.1.6', 'rfc3339-validator': '0.1.4', 'comm': '0.1.4', 'appnope': '0.1.3', 'google-auth-httplib2': '0.1.1', 'rfc3986-validator': '0.1.1', 'langchain': '0.0.325', 'langsmith': '0.0.56', 'sqlmodel': '0.0.11', 'slicer': '0.0.7', 'sqlalchemy2-stubs': '0.0.2a36'}

CURRENT STACK

Name: local ID: xxxx Shared: No User: xxxx Workspace: xxxx

ORCHESTRATOR: default

Name: default ID: xxxx Type: orchestrator Flavor: local Configuration: {} Shared: No User: xxxx Workspace: default / xxxx

ARTIFACT_STORE: default

Name: default ID: xxxx Type: artifact_store Flavor: local Configuration: {'path': ''} Shared: No User: xxxx Workspace: default / xxxx

EXPERIMENT_TRACKER: gke_mlflow_experiment_tracker

Name: gke_mlflow_experiment_tracker ID: xxxxx Type: experiment_tracker Flavor: mlflow Configuration: {'experiment_name': None, 'nested': False, 'tags': {}, 'tracking_uri': 'http://34.140.227.247/mlflow/', 'tracking_username': '****', 'tracking_password': '****', 'tracking_token': '****', 'tracking_insecure_tls': False, 'databricks_host': None} Shared: No User: xxxx Workspace: default / xxxx

ALERTER: slack_alerter

Name: slack_alerter ID: xxxx Type: alerter Flavor: slack Configuration: {'slack_token': '****', 'default_slack_channel_id': 'xxxxxx'} Shared: No User: xxxx Workspace: default / xxxx

IMAGE_BUILDER: local_builder

Name: local_builder ID: xxxxx Type: image_builder Flavor: local Configuration: {} Shared: No User: xxxx Workspace: default / xxxx

What happened?

When upgrading Langchain' version to 0.0.325, the Materializer crash with the following error:

ImportError: cannot import name 'VectorStore' from 'langchain.vectorstores'

This can be solved by going to the file vector_store_materializer.py and changing a line of code like this:

OLD CODE: from langchain.vectorstores import VectorStore

NEW CODE: from langchain.vectorstores.base import VectorStore

Although this error could be avoided by simply downgrading Langchain's version, it's important to note that all versions prior 0.0.325 have a **critical vulnerability/security issue** as mention in here: https://github.com/advisories/GHSA-prgp-w7vf-ch62

So, it's very important to upgrade Langchain versions and fix this issue.

Reproduction steps

  1. Install zenml >= 0.42.1
  2. Install langchain >= 0.0.325
  3. Create a pipeline using langchain in a file run.py
  4. Execute python run.py

Relevant log output

No response

Code of Conduct

strickvl commented 10 months ago

Thank you for this report and your suggestion. As it happens, next week we'll be upgrading this dependency + related things like materializers, so this will be fixed. I'll write here once it's merged onto develop.

strickvl commented 10 months ago

We bumped our langchain integration as I mentioned, so this issue should now be fixed. Thanks for reporting it here @leoregino!