Many of the tools currently cannot work in special files in /dev/stdin in bash, or in general accept files from stdin, this is because of some unnecessary seeks.
Additionally, it would be nice to add some features to filter (for example) by word confidence. This could be done in hocr-text, but we could also have a streaming hocr filter tool that takes hocr as input, and also outputs hocr, but only allows words with certain confidence to pass. This would need to be streaming which makes it a little tricky, but it would be cool to for example pipe Tesseract output directly to such a tool.
Many of the tools currently cannot work in special files in
/dev/stdin
in bash, or in general accept files fromstdin
, this is because of some unnecessary seeks.Additionally, it would be nice to add some features to filter (for example) by word confidence. This could be done in hocr-text, but we could also have a streaming hocr filter tool that takes hocr as input, and also outputs hocr, but only allows words with certain confidence to pass. This would need to be streaming which makes it a little tricky, but it would be cool to for example pipe Tesseract output directly to such a tool.