sbsdev / daisyproducer

An integrated production management system for accessible media
GNU Affero General Public License v3.0
5 stars 3 forks source link

Import from ABACUS fails if the XML contains entities with namespaces #19

Open egli opened 5 years ago

egli commented 5 years ago

If the source XML contains an entity such as

<!ENTITY brlnoblank "<brl:select><brl:when-braille/><brl:otherwise>&nbsp;</brl:otherwise></brl:select>">

then validation when importing the xml from Alfresco fails. The suspicion is that the Python xsl transformation fails to handle the namespaces in the entity properly.

Solution:

Instead of using the lxml transform method just shell out to the standard xsl engine that the rest of the program uses, i.e. something along the lines of https://github.com/sbsdev/daisyproducer/blob/master/daisyproducer/documents/external.py#L132:

command = (
    "java",
    "-jar", join(settings.EXTERNAL_PATH, 'dtbook2sbsform', 'lib', 'saxon9he.jar'),
    "-xsl:%s" % xsl,
    "-s:%s" % source)
command = command + tuple(args)
command = command + tuple(["%s=%s" % (key,value) for key,value in params.iteritems()])
Popen(command, stderr=PIPE, stdout=PIPE)