issues
search
simonw
/
s3-ocr
Tools for running OCR against files stored in S3
Apache License 2.0
115
stars
7
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Not all pages are ocr'd, but Textract claims otherwise
#28
captnswing
closed
1 month ago
4
Expose difference between HANDWRITING and PRINTED and so on
#27
simonw
opened
2 years ago
1
Installing s3-ocr in a GitHub Actions workflow usually fails to work
#26
simonw
closed
2 years ago
9
s3-ocr tool missing all of its commands if python3-click OS package (with click v7) is installed
#25
simonw
closed
2 years ago
31
Textract needs to run in the same region as the S3 bucket
#24
simonw
closed
2 years ago
0
Pages that failed to scan end up missing entirely from the index - should have rows with blank text instead
#23
simonw
closed
2 years ago
14
--dry-run option for s3-ocr start
#22
simonw
closed
2 years ago
0
LimitExceededException when calling the StartDocumentTextDetection operation
#21
ethanscorey
closed
2 years ago
13
Ability to start OCR on just the files in a specified prefix
#20
simonw
closed
2 years ago
0
Re-use existing work if a document has an already processed ETag
#19
simonw
closed
2 years ago
2
Support files other than PDFs
#18
simonw
opened
2 years ago
1
status command should show if OCR has completed
#17
simonw
opened
2 years ago
2
Add a live demo
#16
simonw
closed
2 years ago
11
s3-ocr inspect-job job_id command
#15
simonw
closed
2 years ago
1
Consider using /s3-ocr/key instead of key.s3-ocr.json
#14
simonw
opened
2 years ago
0
Options to do table, form and query extraction using get_document_analysis
#13
simonw
opened
2 years ago
7
s3-ocr file command to process a single PDF
#12
simonw
opened
2 years ago
1
Running fetch and text against jobs that have not yet completed should show an error
#11
simonw
opened
2 years ago
0
Don't default to OCRing everything
#10
simonw
closed
2 years ago
1
Swap order of bucket and database in s3-ocr index command
#9
simonw
closed
2 years ago
0
s3-ocr text command for retrieving just the OCR text
#8
simonw
closed
2 years ago
1
s3-ocr fetch command to fetch OCR results
#7
simonw
closed
2 years ago
2
Bug: index run against folder with new results did the wrong thing
#6
simonw
closed
2 years ago
1
Option to start processing specific files
#5
simonw
closed
2 years ago
0
Add tests
#4
simonw
closed
2 years ago
2
Add help output to the README
#3
simonw
closed
2 years ago
0
Command for creating a SQLite database of the OCR results
#2
simonw
closed
2 years ago
4
Initial design
#1
simonw
closed
2 years ago
2