swarmauri / swarmauri-sdk

a modular multimodal framework for ai applications
https://swarmauri.com
Apache License 2.0
70 stars 41 forks source link

#501 Added the Feature of PDFQueryParser along with componnent file and test file. #749

Closed Rexon-Pambujya closed 3 days ago

Rexon-Pambujya commented 1 week ago

Link to Issue: #501

Feature Name

PDFQueryParser.py

Feature Description

Using PDFQuery, extract text from PDF files

Description

Files Added:

I kindly ask the maintainers to review my code and point out any mistakes. Thank you!

cobycloud commented 3 days ago

pdfqueryparser last update in 2016: https://pypi.org/project/pdfquery/#history

cobycloud commented 3 days ago

it does not have any dependencies image

cobycloud commented 3 days ago

it does have MIT license https://github.com/jcushman/pdfquery

cobycloud commented 3 days ago

pdfquery is more outdated that pdfminer appears to be. additionally, pdfminer appears to have the biggest following

cobycloud commented 3 days ago

pdfquery appears to be more lightweight that pdfminer, in terms of file count and complexity of composition