simonw / s3-ocr

Tools for running OCR against files stored in S3
Apache License 2.0
115 stars 7 forks source link

Don't default to OCRing everything #10

Closed simonw closed 2 years ago

simonw commented 2 years ago

OCRing everything is a dangerous default: if someone runs this against a bucket with 10,000s of PDFs it could cost them a lot of money.

I'm going to switch it to working like this:

s3-ocr start name-of-bucket path/to/one.pdf path/to/two.pdf

If you fail to provide any paths it will show an error message.

To OCR everything, use:

s3-ocr start name-of-bucket --all
simonw commented 2 years ago

This will replace the work I did in: