Open jscancella opened 3 years ago
docker run --rm -it -v ${pwd}:/data ubma/ocr-fileformat ocr-transform hocr alto2.0 0001_xStart0_xEnd937.hocr 0001_xStart0_xEnd937.alto -- '!indent=yes'
docker run --rm -it -v "$PWD":/data ubma/ocr-fileformat ocr-transform hocr alto2.0 0001_xStart0_xEnd937.hocr 0001_xStart0_xEnd937.alto -- '!indent=yes'
0001_xStart0_xEnd937.hocr.txt example file, rename to 0001_xStart0_xEnd937.hocr
look at https://github.com/UB-Mannheim/ocr-fileformat for converting from tesseract HOCR to Chronam Alto