= {Title} :title: pdftools :author: Urs Roesch :firstname: Urs :lastname: Roesch :email: github@bun.ch :revnumber: 0.7.1 :keywords: PDF, CLI, Command Line, tools, documents, pdftk, ghostscript, + poppler utils, tesseract, OCR :!toc: :icons: font :git-user: uroesch :repo-name: pdftools ifdef::env-gitlab[] :base-url: https://gitlab.com/{git-user}/{repo-name} :email: gitlab@bun.ch endif::env-gitlab[] ifdef::env-github[] :base-url: https://github.com/{git-user}/{repo-name} :email: github@bun.ch :tip-caption: :bulb: :note-caption: :information_source: :important-caption: :heavy_exclamation_mark: :caution-caption: :fire: :warning-caption: :warning: endif::env-github[]
image:{base-url}/workflows/bash-compatibility/badge.svg[ title="bash-compatilibity", link="{base-url}/actions?query=workflow:bash-compatilbility" ] image:{base-url}/workflows/test-pdftools/badge.svg[ title="os-compatibility", link="{base-url}/actions?query=workflow:os-compatiblilty" ] image:{base-url}/workflows/create-docs/badge.svg[ title="create-docs", link="{base-url}/actions?query=workflow:create-docs" ]
ifndef::env-github,env-gitlab[] image:icons/gitlab-avatar.png[float="left"] endif::env-github,env-gitlab[]
ifdef::env-github,env-gitlab[] +++
+++ endif::env-github,env-gitlab[]
A collection of PDF command line tools and wrappers for Linux written in Bash Shell script. These are generally speaking convenience tools so one does not have to remember very long and cryptic options and switches.
The heavy lifting is done by backend tools such as pdftk, ghostscript and the poppler utils are used.
[[installation]] == Installation instructions
The scripts are meant to be installed in a users' home directory. To do this
quickly the Makefile
in the root of the repository has a target called
user_install
.
To uninstall everything the target user_uninstall
can be used.
[[img2pdf]] == img2pdf
A script to convert PNGs, TIFFs or JPEGs to PDF files.
License:: MIT Requires:: bash, pdfcat, ImageMagick, pdftk
[[img2pdf-examples]] === Examples
output.pdf
[[img2pdf-usage]] === Usage
Usage:
img2pdf
Options:
-h | --help This message
-d | --delete Delete the images after creating the PDF file.
-o | --output
<<<
[[ocrpdf]] == ocrpdf
Runs PDFs through OCR and saves the output as a text searchable PDF with the same name.
License:: MIT Requires:: bash, pdfcat, pdfimages (poppler-utils), pdftk, tesseract
[[ocrpdf-examples]] === Examples
[[ocrpdf-usage]] === Usage
Usage:
ocrpdf [options]
Options:
-h | --help This message
-q | --quiet Don't send display processed file names
-V | --version Print version information and exit
-l | --lang
Description: Runs PDFs through OCR and saves the output as a text searchable PDF with the same name.
<<<
[[pdfcat]] == pdfcat
A quick hack to replace pdfunite
as it destroys too much of the original's
meta data.
License:: MIT Requires:: bash, pdftk >= 2.0
[[pdfcat-examples]] === Examples
[[pdfcat-usage]] === Usage
Usage:
pdfcat [
<<<
[[pdfmeta]] == pdfmeta
A wrapper script around pdftk
to manipulate a PDFs meta data
License:: MIT Requires:: bash >= 4.0, pdftk >= 2.0
[[pdfmeta-examples]] === Examples
[[pdfmeta-usage]] === Usage
Usage:
pdfmeta
Options:
-h | --help This message
-k | --keywords Comma separated list of keywords
-s | --subject Define the PDFs subject
-t | --title Define the PDFs title
-c | --creator Define the PDFs creator program or library
-p | --producer Define the PDFs producing program
-C | --creation-date Set the creation date of the PDF
-M | --modification-date Set the modification date of the PDF
-V | --version Display version and exit
pdfmeta
works only with the
https://snapcraft.io/pdftk[snap] or with version 3.2.x of
https://gitlab.com/pdftk-java/pdftk[`pdftk-java]. With every other version of pdftk
CreationDateand
ModDatewill not work when running the unit tests. The changed PDF has no problem but
pdfinfofrom the
poppler-utils` package
can't handle the changed entries and reports them as empty.<<<
[[pdfresize]] == pdfresize
A wrapper around ghostscript
to reduce the size of a scanned document
License:: MIT Requires:: bash, ghostscript
[[pdfresize-examples]] === Examples
[[pdfresize-usage]] === Usage
Usage: pdfresize [-q pdfsettings] -i -o
Options: -h | --help This message -i | --input A PDF file preferably of high resolution -o | --output
<<<
[[pdf2pdfa]] == pdf2pdfa
Small script to convert a PDF to PDF/A type.
[[pdf2pdfa-examples]] === Examples
sample.pdf
to a PDF/A-2 named sample_a.pdf
sample.pdf
to a PDF/A-2 named sample_pdfa.pdf
sample.pdf
to a PDF/A-1 named sample_a.pdf
sample.pdf
to a PDF/A-3 exiting on errors.sample.pdf
to a PDF/A-2 with color model CMYK.[[pdf2pdfa-usage]] == Usage
Usage:
pdf2pdfa [
<<<
[[scan2pdf]] == scan2pdf
Is frontend for scanimage
but has only been tested against the Canon LiDE 210
scanner.
Some but not all notable features are:
tesseract
.scan2png
produces PNG and symlinked to scan2jpg
produces JPEG
image output.[[scan2pdf-examples]] === Examples
scan_YYYY-MM-DD_hh-mm-ss.pdf
[source,console]scan_YYYY-MM-DD_hh-mm-ss.pdf
[source,console]<.> Provide file name or press enter to accept the default name.
<.> Menu option 1
scans a page then returns to the prompt.
<.> Menu option 2
writes all pages to a PDF file and prompts for a new name.
<.> Menu option 3
writes all pages to a PDF file and exists.
<.> Scan one page.
<.> Scan another page.
<.> Write PDF and exit.
scan_YYYY-MM-DD_hh-mm-ss.jpg
[source,console][[scan2pdf-usage]] === Usage
Usage: scan2pdf
--interactive -I Interactive mode
--type -t Document Type
Possible values are:
d[ocument] for a text document
i[llustration] for a drawing
ph[otograph] for a photographic pictue
pr[int] for a scan from a print e.g. newspaper
r[aw] for not applying any post-processing
Default: document
--resolution -r Resolution of scan
Possible values are 75, 150, 300, 600, 1200
Default: 300
--page -p Page Size
Possible values are A4, A5, A6, Letter, CreditCard, CD-Cover
Default: A4
--depth -d Color depth of scan
1 for LineArt (Black & White)
8 for Grayscale and Color
16 for Color
Default: 8
--format -f PDF image compression
Possible values are jpeg, zip, lzw
Default: jpeg
--quality -q Recommended for jpeg, zip, png
Values for jpeg from 0 to 100
Values for png and zip from 0 to 9
Default: 90
--mode -m Color mode of scan
Possible values are Lineart, Gray, Color
Default: Color
--ocr -R Run the scan through character recognition
Default: false
--ocr-lang -L Set the language for the character recognition
Every language 'tesseract' supports
Default: deu+eng+fra+ita+jpn+osd
--output -o Filename of PDF file
Default: scan_2022-01-26_23-10-20
--orientation -O Document orientation
Possible options p[ortrait], l[andscape]
Default: portrait
--scanner -s Set the scanner to be used
E.g: gensys:libusb:001:005
--help -h This message
// vim: set colorcolumn=80 textwidth=80 spell spelllang=en_us :