Closed Mazzesy closed 3 months ago
🤖
Hi there,
Thanks for providing such a detailed description of the issue you're facing. It's strange that you're encountering an ImportError for lark even though you've installed it and added a hook for PyInstaller.
To help me understand the issue better, could you please provide the following additional information:
Could you confirm if the lark library is installed in the same Python environment where you're running PyInstaller? You can check this by running pip list
in your terminal and checking if lark is listed.
Could you share the content of your hook-lark.py
file? This will help me understand how you're instructing PyInstaller to handle the lark library.
Have you tried packaging a minimal Python script that only imports and uses lark, without any LangChain components, to see if the issue is specific to LangChain or a more general issue with lark and PyInstaller?
Could you share the full command you're using to run PyInstaller?
Your answers to these questions will help me better understand the issue and hopefully find a solution. Thanks for your patience and cooperation!
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot I will gladly answer the questions:
from PyInstaller.utils.hooks import collect_data_files
datas = collect_data_files('lark')
3. The error does not occur when packaging a minimal Python script that only imports and uses lark.
4. The full command to run PyInstaller is `py -m PyInstaller main.spec`. The content of the main.spec file is described in the problem description.
🤖
Hi there,
I'm sorry to hear that you're having trouble with packaging your LangChain application using PyInstaller. I'll do my best to help you resolve this issue.
From your description, it seems like the problem might be related to how PyInstaller is handling the "lark" library when it's used in conjunction with LangChain. The fact that you're not encountering the same error when packaging a minimal Python script that only uses "lark" suggests that the issue might be specific to LangChain or how it interacts with "lark".
To help me better understand the problem, could you please provide the following additional information:
The exact code you're using to initiate the SelfQueryRetriever from LangChain. This will help me understand if there's anything in your code that might be causing the issue.
The content of your main.spec
file. This will help me see if there's anything in your PyInstaller configuration that might be causing the issue.
Any error messages or stack traces that you're seeing when you try to package your application. This will help me pinpoint where the error is occurring.
The exact steps you're following to package your application. This will help me reproduce the issue on my end.
Once I have this information, I'll be in a better position to help you resolve this issue. Thanks for your patience and cooperation.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot The answer to your questions are:
retriever = SelfQueryRetriever.from_llm(
llm,
db,
document_content_description,
metadata_field_info,
verbose=True
)
main.spec
file is:# -*- mode: python ; coding: utf-8 -*-
block_cipher = None
a = Analysis(
['test.py'],
pathex=[],
binaries=[],
datas=[],
hiddenimports=['tiktoken_ext', 'tiktoken_ext.openai_public', 'onnxruntime', 'chromadb', 'chromadb.telemetry.posthog', 'chromadb.api.local', 'chromadb.db.duckdb'],
hookspath=['.'],
hooksconfig={},
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False,
)
pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
a.datas += Tree('path\to\langchain', prefix='langchain')
exe = EXE(
pyz,
a.scripts,
a.binaries,
a.zipfiles,
a.datas,
[],
name='test',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
upx_exclude=[],
runtime_tmpdir=None,
console=True,
disable_windowed_traceback=False,
argv_emulation=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None,
)
hook-lark.py
file is loaded as stated during packaging Loading module hook 'hook-lark.py' from 'C:\\Users\\user\\PycharmProjects\\test'
.main.spec
file and main.py
file are, there I run the command py -m PyInstaller main.spec
. From the resulting dist folder I start the exe file.🤖
Hello,
Thank you for providing detailed information about your issue. Based on the context provided, it seems that your environment meets the requirements for the "lark" library as specified in the pyproject.toml
file of LangChain version 0.0.233.
However, the issue you're experiencing might be due to how PyInstaller packages the "lark" library. PyInstaller sometimes has trouble correctly packaging certain Python libraries, especially those that use dynamic imports or have complex dependencies.
Here are a few things you can try:
Explicitly import the "lark" library in your script: If you're not already doing this, try adding import lark
at the top of your script. This can sometimes help PyInstaller recognize the library and include it in the packaged application.
Add "lark" to the hiddenimports list in your .spec file: You can try adding 'lark' to the hiddenimports list in your .spec file like so:
hiddenimports=['tiktoken_ext', 'tiktoken_ext.openai_public', 'onnxruntime', 'chromadb', 'chromadb.telemetry.posthog', 'chromadb.api.local', 'chromadb.db.duckdb', 'lark'],
Check the "lark" library's location: Ensure that the "lark" library is installed in the same Python environment that you're using to run PyInstaller. You can check this by running pip show lark
in your terminal and checking the Location
field in the output.
Upgrade to the latest version of LangChain: The latest version of LangChain is v0.0.266. If possible, try upgrading to this version and see if the issue persists. There might have been fixes or improvements related to this issue in the newer versions.
Please try these suggestions and let me know if the issue persists. If it does, we'll continue to investigate and find a solution.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot I tried your suggestions but unfortunately the error persists.
🤖
Hello,
Thank you for providing detailed information about your issue. It seems like you've done a lot of troubleshooting already.
The lark
library is listed as an optional dependency for LangChain. It's possible that LangChain is trying to import lark
in a way that PyInstaller isn't able to detect, hence the ImportError.
One thing you could try is to explicitly include lark
in the hiddenimports
list in your spec file, if you haven't done so already. This will tell PyInstaller to include lark
in the packaged application, even if it can't detect the import:
hiddenimports=['tiktoken_ext', 'tiktoken_ext.openai_public', 'onnxruntime', 'chromadb', 'chromadb.telemetry.posthog', 'chromadb.api.local', 'chromadb.db.duckdb', 'lark'],
If you've already done this and are still encountering the error, it's possible that lark
is being imported dynamically in a way that PyInstaller can't detect. In this case, you could try using a PyInstaller hook to include lark
. You mentioned that you've added a hook-lark.py
file, but it's not clear where this file is located. The hook file should be placed in a directory that's included in the hookspath
option in your spec file.
If none of these suggestions work, it would be helpful if you could provide the exact traceback of the error, as well as the part of your code where you're importing and using LangChain. This could help pinpoint the exact cause of the issue.
I hope this helps! Let me know if you have any other questions.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot
Adding lark
in the hiddenimports
did not solve the issue. The path to the hook-lark.py
is defined in the spec
file option hookspath
, but the ImportError
persists.
The exact traceback of the error is:
Traceback (most recent call last): File "test.py", line 39, in File "langchain\retrievers\self_query\base.py", line 144, in from_llm File "langchain\chains\query_constructor\base.py", line 154, in load_query_constructor_chain File "langchain\chains\query_constructor\base.py", line 115, in _get_prompt File "langchain\chains\query_constructor\base.py", line 72, in from_components File "langchain\chains\query_constructor\parser.py", line 150, in get_parser ImportError: Cannot import lark, please install it with 'pip install lark'.
The code example, where the error occurs is:
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.retrievers import SelfQueryRetriever
from langchain.llms import OpenAI
from langchain.chains.query_constructor.base import AttributeInfo
embeddings = OpenAIEmbeddings()
persist_directory = "data"
text= ["test"]
chunk_size = 1000
chunk_overlap = 10
r_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap,
separators=["\n\n", "(?<=\. )", "\n"])
docs = r_splitter.create_documents(text)
for doc in docs:
doc.metadata = {"document": "test"}
db = Chroma.from_documents(documents=docs, embedding=embeddings, persist_directory=persist_directory)
db.persist()
metadata_field_info = [
AttributeInfo(
name="document",
description="The name of the document the chunk is from.",
type="string",
),
]
document_content_description = "Test document"
llm = OpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
llm,
db,
document_content_description,
metadata_field_info,
verbose=True
)
🤖
Hello,
From the traceback you provided, it seems like the issue is related to the "lark" library not being found when your application is packaged with PyInstaller. This is despite the fact that you've confirmed that "lark" is installed in the same Python environment where you're running PyInstaller and you've added a hook-lark.py
file to PyInstaller.
The hiddenimports
list in the .spec file is used to add Python modules that PyInstaller is unable to detect automatically. Since you've already added 'lark' to the hiddenimports
list, PyInstaller should be able to find it.
However, the issue might be related to the way PyInstaller packages the "lark" library. PyInstaller creates a bundle with Python interpreter, the application and all the dependencies needed to run the application. It's possible that the "lark" library is not being correctly included in the bundle.
One thing you could try is to explicitly include the "lark" library in the datas
list in the .spec file. This list is used to specify non-Python files that should be included in the bundle. You can use the collect_data_files
function from PyInstaller.utils.hooks
to collect all the data files from the "lark" library. Here's how you can do it:
from PyInstaller.utils.hooks import collect_data_files
a = Analysis(
...
datas=collect_data_files('lark'),
...
)
This will include all the data files from the "lark" library in the bundle, which might solve the issue.
If this doesn't work, it would be helpful to know the exact version of the "lark" library that is installed in your Python environment. You can find this by running pip show lark
in your Python environment. This information could help us further diagnose the issue.
Please let me know if this helps or if you need further assistance.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot
The issue persists. The exact version of lark
is 1.1.7
Im having the same issue.
I had this issue and solved it following the steps:
pip install lark
(I have installed version 1.1.7
)pip install lark-parser
(I have installed 0.12.0
)For context, I was running the following code:
retriever = SelfQueryRetriever.from_llm(
llm,
vectordb,
document_content_description,
metadata_field_info,
verbose=True,
)
which generated the following error message:
ImportError Traceback (most recent call last)
Cell In[60], line 3
1 document_content_description = "Lecture notes"
2 llm = OpenAI(temperature=0)
----> 3 retriever = SelfQueryRetriever.from_llm(
4 llm,
5 vectordb,
6 document_content_description,
7 metadata_field_info,
8 verbose=True,
9 )
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/retrievers/self_query/base.py:166, in SelfQueryRetriever.from_llm(cls, llm, vectorstore, document_contents, metadata_field_info, structured_query_translator, chain_kwargs, enable_limit, use_original_query, **kwargs)
162 if "allowed_operators" not in chain_kwargs:
163 chain_kwargs[
164 "allowed_operators"
165 ] = structured_query_translator.allowed_operators
--> 166 llm_chain = load_query_constructor_chain(
167 llm,
168 document_contents,
169 metadata_field_info,
170 enable_limit=enable_limit,
171 **chain_kwargs,
172 )
...
154 allowed_comparators=allowed_comparators, allowed_operators=allowed_operators
155 )
156 return Lark(GRAMMAR, parser="lalr", transformer=transformer, start="program")
ImportError: Cannot import lark, please install it with 'pip install lark'.
Thanks to tsbertalan for this helpful comment
unfortunately the problem persists when using PyInstaller
to create the exe file
Hi, @Mazzesy,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you encountered an ImportError related to the "lark" library when packaging an application using PyInstaller. Despite trying to install "lark" and adding a hook file to PyInstaller, the issue persists. The expected behavior is for the SelfQueryRetriever to be initiated properly for further use. You've provided detailed information about your environment, code, and the steps you've taken to troubleshoot the issue. Other users have also reported similar issues and provided potential solutions, but the problem remains unresolved.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you!
@dosu-bot with langchain version 0.1.15
I get a different error, as seen below, eventhough the chromadb
is installed.
Traceback (most recent call last): File "langchain_community\vectorstores\chroma.py", line 81, in init File "PyInstaller\loader\pyimod02_importers.py", line 385, in exec_module File "chromadb__init.py", line 5, in
File "PyInstaller\loader\pyimod02_importers.py", line 385, in exec_module File "chromadb\auth\token__init.py", line 26, in init__.py", line 410, in from_name importlib_metadata.PackageNotFoundError: No package metadata was found for opentelemetry-sdkFile "PyInstaller\loader\pyimod02_importers.py", line 385, in exec_module File "chromadb\telemetry\opentelemetry__init.py", line 6, in init__.py", line 882, in distribution File "importlib_metadata\File "PyInstaller\loader\pyimod02_importers.py", line 385, in exec_module File "opentelemetry\sdk\resources__init__.py", line 144, in init__.py", line 909, in version File "importlib_metadata\File "importlib_metadata\ During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "test2.py", line 25, in
File "langchain_community\vectorstores\chroma.py", line 778, in from_documents File "langchain_community\vectorstores\chroma.py", line 714, in from_texts File "langchain_community\vectorstores\chroma.py", line 84, in init ImportError: Could not import chromadb python package. Please install it with pip install chromadb
. [6724] Failed to execute script 'test2' due to unhandled exception!
hey @Mazzesy did you find a way to a solution to include lark in pyinstaller yet ?
To resolve the ImportError
related to the chromadb
library when using LangChain version 0.1.15
to create a standalone application with PyInstaller, despite having chromadb
installed, you should ensure that PyInstaller is correctly including the chromadb
package in the bundled application. This issue often arises because PyInstaller does not automatically detect and include all dependencies, especially those imported dynamically or conditionally.
Here are steps to resolve the issue:
Explicit Import: Ensure that chromadb
is explicitly imported in your script if it's not already. This can help PyInstaller detect the dependency. However, since chromadb
is conditionally imported within the LangChain library, this step alone might not suffice.
Use a PyInstaller Hook: Create a hook file for PyInstaller to explicitly tell it to include chromadb
. Create a file named hook-chromadb.py
with the following content:
from PyInstaller.utils.hooks import collect_submodules, collect_data_files
hiddenimports = collect_submodules('chromadb')
datas = collect_data_files('chromadb')
Place this hook in a directory and then point PyInstaller to this directory using the --additional-hooks-dir
option when running PyInstaller:
pyinstaller --additional-hooks-dir=<path_to_hooks_directory> your_script.py
Use the --hidden-import
Option: Alternatively, you can instruct PyInstaller to include chromadb
directly via the command line using the --hidden-import
option:
pyinstaller --hidden-import=chromadb --hidden-import=chromadb.config your_script.py
This tells PyInstaller to consider chromadb
and chromadb.config
as hidden imports and include them in the bundle.
Verify chromadb
Installation: Ensure that chromadb
is installed in the same Python environment you're using to run PyInstaller. You can install it using pip if it's not already installed:
pip install chromadb
Check PyInstaller Documentation for Dynamic Imports: If chromadb
uses dynamic imports internally (imports that are not statically analyzable), you might need to explore more advanced PyInstaller options to ensure these are included. Refer to the PyInstaller documentation for handling dynamic imports.
By following these steps, you should be able to resolve the ImportError
for chromadb
when bundling your application with PyInstaller.
As for your second question, I wasn't able to find any known issues with PyInstaller or the chromadb
library in LangChain version 0.1.15
within the repository. However, it's possible that there may be issues that I wasn't able to find. If the problem persists after trying the above steps, please provide more details about your setup and the exact steps you're taking to bundle your application, and I'll do my best to assist you further.
no its not related to chromadb i just want to my pyinstaller exe file to import lark correctly
@Soham-Kakade
when adding a.datas += Tree('path\\to\\lark', prefix='lark')
to the spec file, the lark
libary seems to be included correctly. But for me with the given script above, the problem with the chromadb
remains, eventhough it is included in the packed folder.
@dosubot
I tried your suggestions, but the problem remains. My spec
-file looks like this:
# -*- mode: python ; coding: utf-8 -*-
a = Analysis(
['test.py'],
pathex=[],
binaries=[],
datas=[],
hiddenimports=['opentelemetry_sdk', 'chromadb.config', 'tiktoken_ext', 'tiktoken_ext.openai_public', 'onnxruntime', 'chromadb', 'chromadb.telemetry.posthog', 'chromadb.api.local', 'chromadb.db.duckdb'],
hookspath=[],
hooksconfig={},
runtime_hooks=[],
excludes=[],
noarchive=False,
)
pyz = PYZ(a.pure)
a.datas += Tree('path\\to\\langchain', prefix='langchain')
a.datas += Tree('path\\to\\chromadb', prefix='chromadb')
a.datas += Tree('path\\to\\lark', prefix='lark')
exe = EXE(
pyz,
a.scripts,
[],
exclude_binaries=True,
name='test',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
console=True,
disable_windowed_traceback=False,
argv_emulation=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None,
)
coll = COLLECT(
exe,
a.binaries,
a.datas,
strip=False,
upx=True,
upx_exclude=[],
name='test',
)
Based on the traceback you provided, it seems like the ImportError
is related to the opentelemetry-sdk
package, which is a dependency of chromadb
. The error message importlib_metadata.PackageNotFoundError: No package metadata was found for opentelemetry-sdk
suggests that PyInstaller is unable to find the metadata for opentelemetry-sdk
, which is likely causing the subsequent ImportError
for chromadb
.
To resolve this issue, you could try the following steps:
Ensure Correct Hidden Imports: You've mentioned adding chromadb
to the hiddenimports
list, which is correct. However, given that chromadb
and chromadb.config
are imported conditionally within a try-except block, ensure that both chromadb
and chromadb.config
are explicitly listed in the hiddenimports
section of your .spec
file. Also, add opentelemetry-sdk
to the hiddenimports
list, as it seems to be a dependency that's not being correctly included.
Use Hooks: If the above step does not resolve the issue, consider creating a hook file for chromadb
and opentelemetry-sdk
. PyInstaller allows you to create custom hooks to specify additional files or modules to include. Create a file named hook-chromadb.py
and hook-opentelemetry-sdk.py
in a directory named hooks
in your project. Inside these hook files, you can specify additional imports or datas:
from PyInstaller.utils.hooks import collect_submodules, collect_data_files
hiddenimports = collect_submodules('chromadb')
datas = collect_data_files('chromadb')
And for opentelemetry-sdk
:
from PyInstaller.utils.hooks import collect_submodules, collect_data_files
hiddenimports = collect_submodules('opentelemetry-sdk')
datas = collect_data_files('opentelemetry-sdk')
Then, when running PyInstaller, use the --additional-hooks-dir
option to specify the directory containing your custom hook:
pyinstaller --additional-hooks-dir=hooks your_script.spec
Check for Subdependencies: Sometimes, the issue might not be with chromadb
directly but with its dependencies. Ensure that all subdependencies are also included in the hiddenimports
if necessary.
Review PyInstaller Warnings: PyInstaller will display warnings for any modules or files it cannot find. Review the warnings after building your application to see if there are any clues about missing modules or files related to chromadb
.
Test in a Clean Environment: Sometimes, local environment configurations can affect the build process. Consider testing the packaging process in a clean virtual environment to rule out environment-specific issues.
If these steps do not resolve the issue, it might be helpful to know more about your environment. Could you provide the versions of Python, PyInstaller, and chromadb
you're using? Also, are you able to successfully import chromadb
and opentelemetry-sdk
in a Python shell outside of PyInstaller?
Regarding the role of chromadb
in LangChain, it serves as a database specifically designed for building AI applications that utilize embeddings. In the LangChain framework, it is used to facilitate operations related to vector databases, such as semantic search or example selection, through a wrapper class named Chroma
. This wrapper allows LangChain to interact with Chroma vector databases seamlessly. Within the SelfQueryRetriever
class, chromadb
(via the Chroma
wrapper) is likely used to retrieve and possibly store embeddings that are relevant to the self-query processes. This could involve searching through embeddings to find the most relevant information or examples based on the query provided to the SelfQueryRetriever
.
I hope this helps! Let me know if you have any other questions or if there's anything else I can do to assist you.
@Mazzesy My spec file looks like this:
`# -*- mode: python ; coding: utf-8 -*-
from PyInstaller.utils.hooks import collect_data_files
a = Analysis(
['UAT_SQL_Dependency_Removed.py'],
pathex=[],
binaries=[],
datas=collect_data_files('lark'),
hiddenimports=['lark'],
hookspath=[],
hooksconfig={},
runtime_hooks=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
excludes=[],
noarchive=False,
)
pyz = PYZ(a.pure)
a.datas += Tree('D:/Anaconda-python/envs/depe/Lib/site-packages/langchain', prefix='langchain')
a.datas += Tree('D:/Anaconda-python/envs/depe/Lib/site-packages/pinecone', prefix='pinecone')
a.datas += Tree('D:/Anaconda-python/envs/depe/Lib/site-packages/lark', prefix='lark')
a.datas += Tree('D:/Anaconda-python/envs/depe/Lib/site-packages/tiktoken', prefix='tiktoken')
a.datas += Tree('D:/Anaconda-python/envs/depe/Lib/site-packages/torch', prefix='torch')
a.datas += Tree('D:/Anaconda-python/envs/depe/Lib/site-packages/txtai', prefix='txtai')
a.datas += Tree('D:/Anaconda-python/envs/depe/Lib/site-packages/win32com', prefix='win32com')
a.datas += Tree('D:/Anaconda-python/envs/depe/Lib/site-packages/sqlalchemy', prefix='sqlalchemy')
exe = EXE(
pyz,
a.scripts,
a.binaries,
a.datas,
[],
name='UAT_SQL_Dependency_Removed',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
upx_exclude=[],
runtime_tmpdir=None,
console=True,
disable_windowed_traceback=False,
argv_emulation=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None,
)
`
And i m getting the error saying:
Traceback (most recent call last):
File "UAT SQL [Dependency_Removed.py](http://dependency_removed.py/)", line 184, in <module>
retriever SelfQueryRetriever.from_lla(
File "langchain\retrievers\self_query [base.py](http://base.py/)", line 225, in from la File "langchain\chains\query_constructor\[base.py](http://base.py/)", line 357, in load query_constructor_runnable
File "langchain\chains\query_constructor\[base.py](http://base.py/)", line 99, in from_components File "langchain\chains\query_constructor\[parser.py](http://parser.py/)", line 174, in get parser install lark'.
ImportError: it with 'pip Cannot import lark, please install [18828] Failed to execute script 'UAT SQL Dependency Removed' due to unhandled exception!
Is this lark error or something else ?
@Soham-Kakade
The ImportError
with lark
occured for me, when using langchain0.0.233
. With 0.1.15
I get the error with chromdb
.
I have langchain 0.0.303
and i m unable to install langchain 0.1.15
as the latest distribution is 0.1.11
tried all the methods and this works
references: https://python.langchain.com/docs/modules/data_connection/retrievers/self_query
tried all the methods and this works
- %pip install --upgrade --quiet lark chromadb
- restart kernel
references: https://python.langchain.com/docs/modules/data_connection/retrievers/self_query
Hello everyone! Unfortunately I'm still having the same error after reinstalling and restarting kernel...
With the following code the problem can be reproduced:
!pip install --upgrade --quiet lark chromadb langchain
# Python 3.10.0
!pip list | grep -E 'lark|chroma|langchain|openai'
chroma-hnswlib 0.7.3
chromadb 0.5.0
langchain 0.0.27
lark 1.1.9
openai 1.23.6
# vectorstores and embeddings
loaders = [
loader, # road to react pdf loader
PyPDFLoader('the-road-to-learn-react.pdf') # another react book, might have overlapped content, treat as dirty data
]
docs=[]
for l in loaders:
docs.extend(l.load())
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1500,
chunk_overlap = 150
)
splits = text_splitter.split_documents(docs)
# create a directory to store
vectordb = Chroma.from_documents(
documents=splits,
embedding=embedding, # openai embeddings
persist_directory='...' # some local location
)
document_content_description = "only from intro to react"
llm = OpenAI(temperature=0,openai_api_key=api_keys['OPENAI_API_KEY'])
retriever = SelfQueryRetriever.from_llm(
llm,
vectordb,
document_content_description,
metadata_field_info,
verbose=True
)
``` --------------------------------------------------------------------------- ImportError Traceback (most recent call last) Cell In[46], line 3 1 document_content_description = "only from intro to react" 2 llm = OpenAI(temperature=0,openai_api_key=api_keys['OPENAI_API_KEY']) ----> 3 retriever = SelfQueryRetriever.from_llm( 4 llm, 5 vectordb, 6 document_content_description, 7 metadata_field_info, 8 verbose=True 9 ) File [~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/langchain/retrievers/self_query/base.py:240](http://localhost:8888/lab/tree/~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/langchain/retrievers/self_query/base.py#line=239), in SelfQueryRetriever.from_llm(cls, llm, vectorstore, document_contents, metadata_field_info, structured_query_translator, chain_kwargs, enable_limit, use_original_query, **kwargs) 233 if ( 234 "allowed_operators" not in chain_kwargs 235 and structured_query_translator.allowed_operators is not None 236 ): 237 chain_kwargs[ 238 "allowed_operators" 239 ] = structured_query_translator.allowed_operators --> 240 query_constructor = load_query_constructor_runnable( 241 llm, 242 document_contents, 243 metadata_field_info, 244 enable_limit=enable_limit, 245 **chain_kwargs, 246 ) 247 return cls( 248 query_constructor=query_constructor, 249 vectorstore=vectorstore, (...) 252 **kwargs, 253 ) File [~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/langchain/chains/query_constructor/base.py:358](http://localhost:8888/lab/tree/~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/langchain/chains/query_constructor/base.py#line=357), in load_query_constructor_runnable(llm, document_contents, attribute_info, examples, allowed_comparators, allowed_operators, enable_limit, schema_prompt, fix_invalid, **kwargs) 354 for ainfo in attribute_info: 355 allowed_attributes.append( 356 ainfo.name if isinstance(ainfo, AttributeInfo) else ainfo["name"] 357 ) --> 358 output_parser = StructuredQueryOutputParser.from_components( 359 allowed_comparators=allowed_comparators, 360 allowed_operators=allowed_operators, 361 allowed_attributes=allowed_attributes, 362 fix_invalid=fix_invalid, 363 ) 364 return prompt | llm | output_parser File [~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/langchain/chains/query_constructor/base.py:99](http://localhost:8888/lab/tree/~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/langchain/chains/query_constructor/base.py#line=98), in StructuredQueryOutputParser.from_components(cls, allowed_comparators, allowed_operators, allowed_attributes, fix_invalid) 96 return fixed 98 else: ---> 99 ast_parse = get_parser( 100 allowed_comparators=allowed_comparators, 101 allowed_operators=allowed_operators, 102 allowed_attributes=allowed_attributes, 103 ).parse 104 return cls(ast_parse=ast_parse) File [~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/langchain/chains/query_constructor/parser.py:174](http://localhost:8888/lab/tree/~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/langchain/chains/query_constructor/parser.py#line=173), in get_parser(allowed_comparators, allowed_operators, allowed_attributes) 172 # QueryTransformer is None when Lark cannot be imported. 173 if QueryTransformer is None: --> 174 raise ImportError( 175 "Cannot import lark, please install it with 'pip install lark'." 176 ) 177 transformer = QueryTransformer( 178 allowed_comparators=allowed_comparators, 179 allowed_operators=allowed_operators, 180 allowed_attributes=allowed_attributes, 181 ) 182 return Lark(GRAMMAR, parser="lalr", transformer=transformer, start="program") ImportError: Cannot import lark, please install it with 'pip install lark'. ```
System Info
When attempting to package my application using PyInstaller, I encounter an error related to the "lark" library. When trying to initiate the SelfQueryRetriever from langchain, I encounter the following problem:
I use:
I have already ensured that the "lark" library is installed using the appropriate command: pip install lark.
I have also tried to add a hook-lark.py file to the PyInstaller as suggested here and also opened a issue in Lark.
Can you help? Thanks in advance!
Who can help?
No response
Information
Related Components
Reproduction
With the following code the problem can be reproduced:
The spec-file to create the standalone application looks like this:
Expected behavior
The SelfQueryRetriever should be initiated properly for futher use.