lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.61k stars 395 forks source link

ImportError when packaging a standalone application with PyInstaller #1319

Open Mazzesy opened 10 months ago

Mazzesy commented 10 months ago

When attempting to package my application using PyInstaller, I encounter an error related to the "lark" library. When trying to initiate the SelfQueryRetriever from langchain, I encounter the following problem:

Traceback (most recent call last): File "test.py", line 39, in File "langchain\retrievers\self_query\base.py", line 144, in from_llm File "langchain\chains\query_constructor\base.py", line 154, in load_query_constructor_chain File "langchain\chains\query_constructor\base.py", line 115, in _get_prompt File "langchain\chains\query_constructor\base.py", line 72, in from_components File "langchain\chains\query_constructor\parser.py", line 150, in get_parser ImportError: Cannot import lark, please install it with 'pip install lark'.

I have already ensured that the "lark" library is installed using the appropriate command: pip install lark.

I have also tried to add a hook-lark.py file to the PyInstaller as suggested here #548.

With the following code the problem can be reproduced:

from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.retrievers import SelfQueryRetriever
from langchain.llms import OpenAI
from langchain.chains.query_constructor.base import AttributeInfo

embeddings = OpenAIEmbeddings()

persist_directory = "data"
text= ["test"]

chunk_size = 1000
chunk_overlap = 10
r_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap,
                                            separators=["\n\n", "(?<=\. )", "\n"])
docs = r_splitter.create_documents(text)

for doc in docs:
    doc.metadata = {"document": "test"}

db = Chroma.from_documents(documents=docs, embedding=embeddings, persist_directory=persist_directory)

db.persist()

metadata_field_info = [
                AttributeInfo(
                    name="document",
                    description="The name of the document the chunk is from.",
                    type="string",
                ),
            ]

document_content_description = "Test document"

llm = OpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
    llm,
    db,
    document_content_description,
    metadata_field_info,
    verbose=True
)

The spec-file to create the standalone application looks like this:

# -*- mode: python ; coding: utf-8 -*-

block_cipher = None

a = Analysis(
    ['test.py'],
    pathex=[],
    binaries=[],
    datas=[],
    hiddenimports=['tiktoken_ext', 'tiktoken_ext.openai_public', 'onnxruntime', 'chromadb', 'chromadb.telemetry.posthog', 'chromadb.api.local', 'chromadb.db.duckdb'],
    hookspath=['.'],
    hooksconfig={},
    runtime_hooks=[],
    excludes=[],
    win_no_prefer_redirects=False,
    win_private_assemblies=False,
    cipher=block_cipher,
    noarchive=False,
)
pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)

a.datas += Tree('path\to\langchain', prefix='langchain')

exe = EXE(
    pyz,
    a.scripts,
    a.binaries,
    a.zipfiles,
    a.datas,
    [],
    name='test',
    debug=False,
    bootloader_ignore_signals=False,
    strip=False,
    upx=True,
    upx_exclude=[],
    runtime_tmpdir=None,
    console=True,
    disable_windowed_traceback=False,
    argv_emulation=False,
    target_arch=None,
    codesign_identity=None,
    entitlements_file=None,
)

Can you help? Thanks in advance!

erezsh commented 10 months ago

Can you check if this also happens with Lark 1.1.5 ?

Mazzesy commented 10 months ago

still the same error with Lark 1.1.5

MegaIng commented 10 months ago

Seems to me like it's a langchain (or less likely, PyInstaller) issue, not a lark issue. Did you open an issue there?

Mazzesy commented 10 months ago

I opened an issue at langchain: url

erezsh commented 10 months ago

@Mazzesy I fixed your url here, but you made the same syntax mistake in the langchain issue. The title comes first, the url comes second.

Mazzesy commented 10 months ago

@erezsh thanks for the hint. I fixed it in the langchain issue.