memaryParse - Githubissues

memary currently parses the agents' responses, which are stored in a .txt file, before inserting them into our knowledge graphs.

As we look to support agentic systems running real-world tasks, our memory unit needs to allow the system's maintainer to pre-process the knowledge graph with relevant data. For example, an e-commerce company wants to upload their users' information so that the agent can initially respond with context.

Companies may present this data in various file formats, such as .csv, .pdf, .txt, .pptx, or others. That is why memary must support many configurable parsers under a parent parser - memaryParse. For example, a company running an agent with data in .csv and .docx files can configure a parent retriever that supports both formats to pre-process the data into the knowledge graph before running their agents using memary.

We expect memaryParse to expand over time. Initially, we hope to support the following formats:

.txt (already configured)
table extraction
JSON
Images (.jpg, .jpeg, .png, .gif)
Document and presentations (.pdf, .doc / .docx, .rtf, .pages, .pptx, .xml, .key)
Web (htm, html)
Spreadsheets (.xlsx, .xls, .csv, .numbers)

memaryParse should also support the following result types: TXT, MD, and JSON (we will look to add others in the future).

Resource for inspiration: https://github.com/run-llama/llama_parse/blob/main/llama_parse/utils.py

kingjulio8238 / Memary

memaryParse #44