nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
70.4k stars 7.68k forks source link

Support email format such as eml, mbox, pst in LocalDocs #1033

Open peterchanws opened 1 year ago

peterchanws commented 1 year ago

Feature request

Support email format such as eml, mbox, pst in LocalDocs.

h2ogpt support eml: https://github.com/h2oai/h2ogpt/blob/main/docs/README_LangChain.md#supported-datatypes privateGPT did the same: https://github.com/imartinez/privateGPT

Motivation

At Stanford, we collect email archives from accomplished individuals for research purposes. It will be great if you can support email format such as eml, mbox, pst in LocalDocs. For privacy reason, researchers can use the email archives only in our reading room.

Your contribution

I can test out the enhanced software.

rw86347 commented 1 year ago

Are you currently referencing GTP4All to other data of any format? If so Can you give me a pointer on how to reference the other data?

peterchanws commented 1 year ago

Here is the docu for Enabling LocalDocs:

https://docs.gpt4all.io/gpt4all_chat.html#localdocs-beta-plugin-chat-with-your-data

egmowens commented 1 year ago

I came across this issue while looking for something else related to localdocs, but I wanted to chime in because I also work in a library/archives environment, and we would be interested in this as well.

ericpwrusr commented 11 months ago

I am also interested in .pst files being supported.

Leivaster commented 9 months ago

me too. Compatibility with gmail

n-engelhardt commented 2 months ago

Just for kicks, I changed the supported extensions to include EML and pointed it to a folder with 3 EML files and 3 PDF files that were originally attached to the emails. It's been several hours, and it's 90% complete embedding. I'll see what happens when I try to use the data in a chat.

image

milver commented 2 months ago

Just for kicks, I changed the supported extensions to include EML and pointed it to a folder with 3 EML files and 3 PDF files that were originally attached to the emails. It's been several hours, and it's 90% complete embedding. I'll see what happens when I try to use the data in a chat.

image

What did you find? mailboxes are the de-facto knowledge management systems of every individual, so very interested in this feature.

n-engelhardt commented 2 months ago

It didn't work very well. I tried stripping the attachments and extra headers, but it still wasn't as good as just printing it to PDF and using that.