pashpashpash / vault-ai

OP Vault ChatGPT: Give ChatGPT long-term memory using the OP Stack (OpenAI + Pinecone Vector Database). Upload your own custom knowledge base files (PDF, txt, epub, etc) using a simple React frontend.
https://vault.pash.city
MIT License
3.26k stars 307 forks source link

Issue: pdftotext not found in %PATH% #10

Open shaiss opened 1 year ago

shaiss commented 1 year ago

I see no mention of it in this repo, but I can't upload files as I get this error


          GET /js/components_Pages_LandingPage_index_jsx.bundle.js
2023/04/18 14:44:58 [UploadHandler] UUID= b9258226-d99b-4925-a100-c6c065555aa1
2023/04/18 14:44:58 [UploadHandler ERR] Error extracting text from PDF exec: "pdftotext": executable file not found in %PATH%
[negroni] Apr 18 14:44:58 | 200 | 26.1222ms
          POST /upload

I've copied pdftotext.ext and updated the path.  The problem is on windows it needs to be called with ./pdftotext.ext.  Would it be plausible to switch to https://pypi.org/project/poppler-utils/ to simplify deployment?  This would help with containerizing the app.

looks like pdfinfo.exe is need.

for a quick fix I added import "os" and added the path to the upload function:
```func UploadHandler(w http.ResponseWriter, r *http.Request) {

    os.Setenv("PATH", os.Getenv("PATH")+";C:\\Users\\[username]")```
pashpashpash commented 1 year ago

@shaiss I forgot to include this dependency in the README. You need to install poppler

Ubuntu:

sudo apt-get install -y poppler-utils

Mac:

brew install poppler

shaiss commented 1 year ago

@pashpashpash ty for the reply. Would it be plausible to switch to https://pypi.org/project/poppler-utils/ to simplify deployment? This would help with containerizing the app.

pashpashpash commented 1 year ago

@shaiss what changes do you recommend? I'd be happy to merge a PR if you want to try your hand at it

shaiss commented 1 year ago

From my understanding the popper utils can be installed as a python module or installed separately. As a python/node module we could include it as part of the build. Therefore making containerization easier.

I'll have to take a stab at it. But likely there's some exp out there in the community

atcebrian commented 1 year ago

I followed this tutorial for windows: https://www.reddit.com/r/ChatGPT/comments/12qbrmw/comment/jhptv12/?utm_source=reddit&utm_medium=web2x&context=3

and I'm getting the same error:


 Error extracting text from PDF exec: "pdftotext": executable file not found in %PATH%
[negroni] Apr 18 14:44:58 | 200 | 26.1222ms
          POST /upload