itst / pdf-highlights

Export your PDF highlights to markdown files.
Other
23 stars 3 forks source link
annotations export highlights markdown pdf python

About

This script extracts extracts annotations (highlights, comments, etc.) from a PDF file, and formats them as plain text.

The scripts uses colormath to identify the highlights' colors, see the wiki. The default template uses these colors to determine hierarchy and meaning.

At present, the following annotations are supported:

For each annotation, the page number is given, along with the associated (highlighted/underlined) text, if any. Additionally, if the documents includes outlines (aka bookmarks) such as those generated by the hyperref package, those are also used to identify to which section in the document the annotation refers.

See the wiki for more information.

Installation

 pip install pdfminer.six chardet six colormath Jinja2 pathlib
 python setup.py install

Usage

pdf-highlights.py FILE.PDF [> OUTPUT]

Dependencies

My own setup:

Output formatting

There's a Jinja2 template you can adopt as you like. The script exposes the following data to the template:

See the wiki for more information.

Author

Original author is Andrew Baumann. Thank you, Andrew!
This fork is maintained by Sascha A. Carlin.