Closed Rexon-Pambujya closed 3 days ago
pdfqueryparser last update in 2016: https://pypi.org/project/pdfquery/#history
it does not have any dependencies
it does have MIT license https://github.com/jcushman/pdfquery
pdfquery is more outdated that pdfminer appears to be. additionally, pdfminer appears to have the biggest following
pdfquery appears to be more lightweight that pdfminer, in terms of file count and complexity of composition
Link to Issue: #501
Feature Name
PDFQueryParser.py
Feature Description
Using PDFQuery, extract text from PDF files
Description
Files Added:
[x]
pkgs\community\swarmauri_community\parsers\concrete\PDFQueryParser.py
This file contains the implementation of the class PDFQueryParser which is used to parse PDF documents. It includes features for reading text content from PDF files.[x]
pkgs\community\tests\unit\parsers\PDFQueryParser_test.py
This file is dedicated to the unit testing of the PDFQueryParser class. It ensures the parsing functionalities work correctly by validating various input types and verifying the extracted text's accuracy.