Closed MthwRobinson closed 1 year ago
Hi,
New to all of this github and python stuff but chatGPT is helping me get started.
I installed this github repo into my pycharm application (again, I'm a n00b). When I click/run 'python insgest_data.py' as is (with all dependencies installed), I get this error message:
C:\Users\jvonr\PycharmProjects\chat-your-data\venv\Scripts\python.exe ingest_data.py
Traceback (most recent call last):
File "C:\Users\jvonr\PycharmProjects\chat-your-data\ingest_data.py", line 9, in
Is there a reason this isn't working straight after installation or am I just dumb and doing something wrong? Thanks!
@jvonreusner - unstructured
uses libmagic
for filetype detection. Since it looks like you're on window, I think you need pip install python-magic-bin
instead of python-magic
. We'll add an issue on the unstructured
side to see if we can't clean that up for Windows pip install
s.
Docs for that are here
Added https://github.com/Unstructured-IO/unstructured/issues/234 to address
Wonderful - thank you so much.
I'm also having trouble verifying which version of python I need, and it doesn't seem to be clearly stated in the requirement documents.
I'm currently using the most up to date version for windows, python 3.11 but am getting error messages about my interpreter being invalid.
ChatGPT tried to give me answer by telling me 3.6.3, but I think it has no idea what it's talking about lol
3.6.3
is definitely wrong. unstructured
won't work with versions 3.6
and older because of the pytorch
dependency for the PDF partitioning model (though if you don't include the local-inference
extra dependencies it won't pull that in and may work for you). We currently test against 3.8 and have an issue to add later Python versions to CI (this one here https://github.com/Unstructured-IO/unstructured/issues/145). We've gotten it working on 3.10
before and I wouldn't think 3.11
would be an issue.
If you're running on Windows, I'd also check out these instructions from our docs. Long story short, the detectron2
model we use for PDF partitioning doesn't support Windows, but there's a workaround you can use to get it running. We do intend to move to a new model for PDF partitioning in the near future that should be more Windows friendly.
Cool application! Raising a small PR to fix a
README
typo and addbeautifulsoup4
torequirements.txt
. I get the followingModuleNotFoundError
if it's not installed.