run-llama / llama-hub

A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
https://llamahub.ai/
MIT License
3.44k stars 729 forks source link

Create PDFPlumberReader #798

Closed JAlexMcGraw closed 9 months ago

JAlexMcGraw commented 9 months ago

Description

Created the PDFPlumberReader class, and the associated needed files. Also updated the main README, to let users know to run poetry run make test, because the prior CLI run didn't work.

Motivation to create this is because the builtin PDFReader did not read text as well as PDFPlumber package. Lots of words ended up split in half, or missing.

Required install of pdfplumber to run this.

Fixes # (issue)

Type of Change

Please delete options that are not relevant.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Suggested Checklist: