Closed particitae closed 12 months ago
Sorry for the delay. You can use kraken but it isn't round-tripable and you will use information that isn't "useful" for processing inside kraken. A fairly decent tool seems to be ocr-fileformat (https://github.com/UB-Mannheim/ocr-fileformat) but I haven't personally used it.
ocr-fileformat doesn't work.....
the conversion with Hocr format seems ok
On 23/10/09 06:09AM, Particitae wrote:
ocr-fileformat doesn't work.....
the conversion with Hocr format seems ok
Hm ok. What doesn't work/what information are you losing? If you're only interested in regions/lines and don't care about anything else being discarded you can use the XMLPage -> Segmentation -> serialization.serialize pipeline between the different formats but it is absolutely not designed to be lossless.
Hi I've tried different software (xslt and software) and none of them work. What methods do you recommend for converting PageXML to Alto ? Perhaps the better way is to use kraken library.
thanks for your answer.