Closed havr-p closed 1 week ago
Hi ! Thanks a lot for the interest and taking a look.
It appears I was maybe overly enhusiastic when creating PersistDict to work around langchain caching issues they have not fixed for months? It's basically just a sqlite db accessed like a python dict for conveniance to cache things.
If something goes wrong try deleting the cache folder, or disabling the cache (cf bottom of the --help). Use --debug to know where the cache is at.
Please tell me if that helped!
Hi! I ran into similar issue, wdoc fails to open sqlite database. Here are the logs:
__ ____| | ___ ___
\ \ /\ / / _` |/ _ \ / __|
\ V V / (_| | (_) | (__
\_/\_/ \__,_|\___/ \___|
2024-11-02T00:33:43.514550+0100 INFO wdoc 8638483456 93318 printer 92 Bypassing model name matching for model 'openai/gpt-4o'
2024-11-02T00:33:43.514683+0100 INFO wdoc 8638483456 93318 printer 92 Bypassing model name matching for model 'openai/gpt-4o-mini'
2024-11-02T00:33:43.514834+0100 INFO wdoc 8638483456 93318 printer 92 Cache location: /Users/redacted/Library/Caches/wdoc
2024-11-02T00:33:43.514894+0100 INFO wdoc 8638483456 93318 printer 92 Log location: /Users/redacted/Library/Logs/wdoc
2024-11-02T00:33:43.514994+0100 INFO wdoc 8638483456 93318 printer 92 Loading model via litellm
2024-11-02T00:33:45.252845+0100 INFO wdoc 8638483456 93318 printer 92 Loading pdf: 'situationalawareness.pdf'
2024-11-02T00:33:45.320935+0100 INFO wdoc 8638483456 93318 printer 92 Trying to parse situationalawareness.pdf using pymupdf
2024-11-02T00:33:45.463420+0100 INFO wdoc 8638483456 93318 printer 92 Language probability after parsing situationalawareness.pdf: {'pymupdf': 0.8829599149299391}
2024-11-02T00:33:46.009417+0100 INFO wdoc 8638483456 93318 printer 92 Done loading all 1 documents in 0.76s
2024-11-02T00:33:46.009672+0100 INFO wdoc 8638483456 93318 printer 92 No document failed to load!
2024-11-02T00:33:46.009719+0100 INFO wdoc 8638483456 93318 printer 92 Deduplicating...
2024-11-02T00:33:46.009753+0100 INFO wdoc 8638483456 93318 printer 92 Getting all hash
2024-11-02T00:33:46.009799+0100 INFO wdoc 8638483456 93318 printer 92 Counting them
2024-11-02T00:33:46.009920+0100 INFO wdoc 8638483456 93318 printer 92 No duplicates!
2024-11-02T00:33:46.010211+0100 INFO wdoc 8638483456 93318 printer 92 Selected embedding model 'text-embedding-3-small' of backend openai
2024-11-02T00:33:46.020518+0100 DEBUG wdoc 8638483456 93318 _log 323 PersistDict:.__init__
2024-11-02T00:33:46.020735+0100 DEBUG wdoc 8638483456 93318 _log 323 PersistDict:.__init_table__
2024-11-02T00:33:46.020772+0100 DEBUG wdoc 8638483456 93318 _log 323 PersistDict:opening connection
2024-11-02T00:33:46.020881+0100 INFO wdoc 8638483456 93318 printer 92
--verbose was used so opening debug console at the appropriate frame. Press 'c' to continue to the frame of this print.
2024-11-02T00:33:46.023422+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/bin/wdoc", line 8, in <module>
sys.exit(cli_launcher())
^^^^^^^^^^^^^^
2024-11-02T00:33:46.023472+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/wdoc/__main__.py", line 69, in cli_launcher
fire.Fire(wdoc)
2024-11-02T00:33:46.023507+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-02T00:33:46.023570+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
2024-11-02T00:33:46.023601+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
2024-11-02T00:33:46.023629+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/wdoc/utils/misc.py", line 701, in new_func
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-02T00:33:46.023660+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/wdoc/wdoc.py", line 522, in __init__
self.prepare_query_task()
2024-11-02T00:33:46.023691+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/wdoc/wdoc.py", line 924, in prepare_query_task
self.loaded_embeddings, self.embeddings = load_embeddings(
^^^^^^^^^^^^^^^^
2024-11-02T00:33:46.023721+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/wdoc/utils/embeddings.py", line 181, in load_embeddings
lfs = LocalFileStore(
^^^^^^^^^^^^^^^
2024-11-02T00:33:46.023748+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/wdoc/utils/customs/compressed_embeddings_cache.py", line 59, in __init__
self.pdi = PersistDict(
^^^^^^^^^^^^
2024-11-02T00:33:46.023775+0100 INFO wdoc 8638483456 93318 printer 92 File "<@beartype(PersistDict.PersistDict.PersistDict.__init__) at 0x11dc0c2c0>", line 207, in __init__
2024-11-02T00:33:46.023801+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/PersistDict/PersistDict.py", line 100, in __init__
self.__init_table__()
2024-11-02T00:33:46.023828+0100 INFO wdoc 8638483456 93318 printer 92 File "<@beartype(PersistDict.PersistDict.PersistDict.__init_table__) at 0x11dc0c400>", line 12, in __init_table__
2024-11-02T00:33:46.023853+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/PersistDict/PersistDict.py", line 147, in __init_table__
conn = self.__connect__()
^^^^^^^^^^^^^^^^^^
2024-11-02T00:33:46.023879+0100 INFO wdoc 8638483456 93318 printer 92 File "/Users/redacted/.local/share/virtualenvs/wdoc_test-ePoOMSPS/lib/python3.11/site-packages/PersistDict/PersistDict.py", line 113, in __connect__
return sqlite3.connect(
^^^^^^^^^^^^^^^^
2024-11-02T00:33:46.023909+0100 INFO wdoc 8638483456 93318 printer 92 <class 'sqlite3.OperationalError'> : unable to open database file
2024-11-02T00:34:16.888374+0100 INFO wdoc 8638483456 93318 printer 92 You are now in the exception handling frame.
I tried running it with '--disable_llm_cache' and deleting the cache folder, but got the same result. Also tried both the main and the dev branch.
I checked the database file itself and it seems empty? Not sure, I'm not really familiar with langchain:
> sqlite3 /Users/redacted/Library/Caches/wdoc/langchain.db
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.
sqlite> .tables
metadata storage
sqlite> SELECT * FROM storage;
sqlite>
sqlite> select * from metadata;
version|0.1.3
sqlite>
I ran into same error while running sample script. Creating a directory for sqlite database did the trick:
mkdir /Users/redacted/Library/Caches/wdoc/CacheEmbeddings
Hi. I'm really sorry but it's an exceptionnaly busy week for me, I'm normally much more responsive!
I only have my phone on me but I added one line in the latest dev to mkdir the parents of LocalFileStore's db. That should do it.
Sorry for the oversight and thanks a lot for bringing this to my attention.
@havr-p thanks for kindly offering help for Sentry. I've never used it so I'm not sure it's actually needed but if you care to explain to me why I should I'll gladly read that!
Thanks to @aivisol and @blinkenl1ghts this was much quicker to fix on my phone.
Don't hesitate to reopen!
This is included in the latest release
Hello, I got error when tried to use wdoc in WSL
I followed standard way of installing wdoc through pip Do you know what can be the reason? Maybe I can help you to add some diagnostic tools to wdoc, like Sentry e.g.