Open ghost opened 5 years ago
Could you provide more informations, are there any logs of the error stack trace ?
The alto schema version didn't change, version 3.1 is used since the first pdfalto release : https://github.com/kermitt2/pdfalto/blob/master/schema/alto.xsd
Earlier the schemain the alto xml was: xmlns="http://www.loc.gov/standards/alto/ns-v3#", but now I get: xmlns="http://www.loc.gov/standards/alto/v3/alto.xsd"
this was updated because the first link is wrong, it's not pointing to the schema.
@Aazhar Schema-location and Namespace URL don't have to be identical. xmlns should be http://www.loc.gov/standards/alto/ns-v3# (see targetNamespace="http://www.loc.gov/standards/alto/ns-v3#" in http://www.loc.gov/standards/alto/v3/alto.xsd)
For schema location, you can use something like
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/standards/alto/v3/alto.xsd"
Added xsi:schemaLocation
with d49bf77204d1700b7263cb2641aa508c33058c9c
<alto xmlns="http://www.loc.gov/standards/alto/ns-v3#" xsi:schemaLocation="http://www.loc.gov/standards/alto/v3/alto.xsd">
Previously, we used pdfalto to generate an ALTO XML from the pdf and https://github.com/filak/hOCR-to-ALTO to convert the ALTO XML to hOCR file after that. With the newest release of pdfalto this does not work anymore, since the ALTO version has seemed to have changed. Can you share which version of ALTO is currently produced with pdfalto?