sanskrit-coders / doc_curation

MIT License
7 stars 4 forks source link

^Build status Documentation Status PyPI version

doc curation

A package for curating doc file collections. Prominent features:

For users

Installation or upgrade

Usage

Google Drive API wrapper

from doc_curation.pdf import drive_ocr
pdf_file = '/home/file.pdf'
key_file = '/home/key.json'
drive_ocr.split_and_ocr_on_drive(pdf_path=pdf_file, google_key=key_file, small_pdf_pages=5)

Command line invocation:

# For help and details - 
/usr/bin/python3 -m doc_curation.pdf.drive_ocr --help
/usr/bin/python3 -m doc_curation.pdf.drive_ocr --input_path=/some/Dir/Or/File --google_key=/some/path/service_account_key.json --small_pdf_pages=5

Usage for the google_vision_pdf.py to OCR pdf to txt files.

python3 google_vision_pdf.py --input-file <input.pdf>
/usr/bin/python3 -m doc_curation.pdf.google_vision_pdf  --input-file <input.pdf>

For contributors

Contact

Have a problem or question? Please head to github.

Packaging

Build documentation

Testing

Run pytest in the root directory.

Auxiliary tools