Use .env files and import pypdf

techleadhd / chatgpt-retrieval

1.65k stars 787 forks source link

Use .env files and import pypdf #13

Closed jasuca closed 10 months ago

jasuca commented 1 year ago

Added pypdf as requirement to read pdf Created a requirements.txt file Use .env to avoid commit OpenAI keys

techleadhd commented 1 year ago

Is pypdf necessary, or does unstructured include that in its dependencies?
The dependencies in requirements.txt (wrapt, ipython, pythondotenv) don't seem to match the pip install command...?
typo on "You can also use" to "You can use"

jasuca commented 1 year ago

pypdf is necessary. Without manually using was not reading pdf correctly.
Sorry, I updated the command. iphython is not necessary. wrapt is necessary to enforce the 1.15. The 1.14 doesn't support the latest python version. pythondotenv is for the .env support
Typo fixed

StrictLine commented 11 months ago

I like the idea adding the missing requirements.txt to the project, many of us misses it.

techleadhd commented 10 months ago

My comments here are similar to https://github.com/techleadhd/chatgpt-retrieval/pull/31.

This might be a matter of preference, but to keep things simple, I'm favoring keeping the api key in the code, since setting up env vars is one more thing to learn for new folks. Not using the env vars also makes the code more compatible with web apps, which may not be running under the user's environment. Let me know if there is a strong reason otherwise.

StrictLine commented 10 months ago

Should we create another PR for the missing dependencies?

techleadhd commented 10 months ago

i was able to run this without pythondotenve or wrapt... are you sure those are needed...?

StrictLine commented 10 months ago

No, for me pypdf was necessary, but I'd need to re-test it.

StrictLine commented 10 months ago

No, for me pypdf was necessary, but I'd need to re-test it.

✗ python chatgpt.py "what is my cat's name"
Using embedded DuckDB without persistence: data will be transient
I'm sorry, but I don't know what your cat's name is.

This is the result after a fresh cloning from GitHub and using venv with the installation command in the README: pip install langchain openai chromadb tiktoken unstructured

But my assumption was wrong, installing the pypdf does not solve the problem. It's related to the DirectoryLoader, which "uses the UnstructuredLoader" under the hood, but directly using this works like a charm. So the underlying implementation has changed without updating the documentation.