transpect / idml2xml-frontend

Converts InDesign IDML to XML
BSD 2-Clause "Simplified" License
15 stars 7 forks source link

Catalog resolution problems when running on AWS Lambda #2

Closed abdelazer closed 4 years ago

abdelazer commented 4 years ago

Thanks for all your work on this family of libraries. Please let me know if you'd prefer this Issue was filed inside another project.

I'm trying to create an environment to run the idml2xml pipeline inside AWS Lambda and have found the configuration process quite challenging. The latest problem: in some cases catalog-powered URI resolution errors when it gets an encoded %2F value instead of the / it expects.

I would like your guidance on how to continue debugging this issue.

Here's the details I think I've established:

2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b DEBUG SetAttributes:169 - Running set-attributes add-orig-href
2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b DEBUG Sink:169 - Running sink sink1
2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b DEBUG Message:169 - Running {http://xmlcalabash.com/ns/extensions}message !1.22.2.12.2
18:59:07
Message: Calling resolve-uri() with $catalog-resolved-uri value: -%2F- and @local-href value: --
18:59:07
2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b ERROR Main:140 - Pipeline failed: An empty sequence is not allowed as the second argument of fn:resolve-uri()
2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b ERROR Main:143 - Underlying exception: An empty sequence is not allowed as the second argument of fn:resolve-uri()
2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b DEBUG Main:145 - An empty sequence is not allowed as the second argument of fn:resolve-uri()
18:59:08
net.sf.saxon.s9api.SaxonApiUncheckedException: An empty sequence is not allowed as the second argument of fn:resolve-uri()
18:59:08
at net.sf.saxon.s9api.XPathSelector.iterator(XPathSelector.java:203) ~[Saxon-HE-9.9.1-5.jar:?]
      <p:otherwise>
        <p:documentation>Other protocol or relative filename. We don’t support other protocols/notations, so we assume it to be
          a relative path.</p:documentation>
        <!-- KHF -->
        <cx:message>
          <p:with-option name="message"
            select="concat('Calling resolve-uri() with $catalog-resolved-uri value: -', $catalog-resolved-uri, '- and @local-href value: -', /c:result/@local-href,  '-')"/>
        </cx:message>
        <tr:file-uri name="resolved-uri">
          <p:with-option name="filename" select="resolve-uri($catalog-resolved-uri, /c:result/@local-href)">
            <p:pipe port="result" step="cwd-uri"/>
          </p:with-option>
        </tr:file-uri>
        <tr:unescape-uri attribute-names="os-path"/>
      </p:otherwise>
Picked up JAVA_TOOL_OPTIONS: -Dxml.catalog.files=/var/task/calabash/xmlcatalog/catalog.xml -Dlog4j.configurationFile=log4j2.xml
START RequestId: Version: $LATEST
2020-01-10 19:52:31 INFO  IDMLHandler:32 - Handling S3 PUT to source-idml-bucket-758045473916/StoneRiverCrossing.idml
2020-01-10 19:52:34 INFO  IDMLHandler:65 - Downloading IDML file from S3 (arn:aws:s3:::source-idml-bucket-758045473916/StoneRiverCrossing.idml) to /tmp/input.idml
2020-01-10 19:52:34 INFO  IDMLHandler:47 - Invoking XProc pipline (/var/task/xpl/idml2xml-frontend.xpl) with config /var/task/transpect-config.xml
2020-01-10 19:52:37 DEBUG XProcConfiguration:426 - XML Calabash searching for Saxon extension functions on the class path
2020-01-10 19:52:37 DEBUG XProcRuntime:170 - XML Calabash version 1.1.30 (for Saxon 9.9.1.5/HE)
2020-01-10 19:52:40 INFO  Catalog:338 - Resolved: http://transpect.io/idml2xml/xpl/idml2hub.xpl
2020-01-10 19:52:41 INFO  Catalog:338 - Resolved: http://transpect.io/calabash-extensions/unzip-extension/unzip-declaration.xpl
2020-01-10 19:52:41 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/file-uri/xpl/file-uri.xpl
2020-01-10 19:52:42 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/recursive-directory-list/xpl/recursive-directory-list.xpl
2020-01-10 19:52:42 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 19:52:42 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/file-uri/xpl/file-uri.xpl
2020-01-10 19:52:42 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 19:52:43 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xslt-mode/xpl/xslt-mode.xpl
2020-01-10 19:52:44 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 19:52:45 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-xml-model.xpl
2020-01-10 19:52:45 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xslt-mode/xpl/xslt-mode.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-xml-model.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-hub-xml-model.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 19:52:48 INFO  Catalog:338 - Resolved: http://transpect.io/calabash-extensions/mathtype-extension/xpl/mathtype2mml-declaration.xpl
2020-01-10 19:52:48 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 19:52:48 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 19:52:55 DEBUG XSLT:169 - Running xslt !1.2.7.3.3
2020-01-10 19:52:58 DEBUG Store:169 - Running store !1.2.7.3.4: href=/tmp/CB-a6b8a81d-6aa9-4a83-8dd2-200e58b2b931_idml2hub-start.txt
2020-01-10 19:52:58 DEBUG Info:169 - Running {http://exproc.org/proposed/steps/os}info info
2020-01-10 19:52:58 DEBUG XSLT:169 - Running xslt catalog-resolve
2020-01-10 19:52:58 INFO  Catalog:338 - Resolved: http://transpect.io/xslt-util/resolve-uri/xsl/resolve-uri.xsl
2020-01-10 19:53:02 DEBUG AddAttribute:169 - Running add-attribute empty-result
2020-01-10 19:53:02 DEBUG Sink:169 - Running sink !1.17
2020-01-10 19:53:02 DEBUG Info:169 - Running {http://exproc.org/proposed/steps/os}info info
2020-01-10 19:53:02 DEBUG XSLT:169 - Running xslt catalog-resolve
2020-01-10 19:53:02 INFO  Catalog:338 - Resolved: http://transpect.io/xslt-util/resolve-uri/xsl/resolve-uri.xsl
2020-01-10 19:53:03 DEBUG AddAttribute:169 - Running add-attribute empty-result
2020-01-10 19:53:03 DEBUG Sink:169 - Running sink !1.17
2020-01-10 19:53:03 DEBUG Identity:169 - Running identity !1.19.1.2
2020-01-10 19:53:03 DEBUG SetAttributes:169 - Running set-attributes add-orig-href
2020-01-10 19:53:03 DEBUG Sink:169 - Running sink sink1
2020-01-10 19:53:03 DEBUG Message:169 - Running {http://xmlcalabash.com/ns/extensions}message !1.22.2.5.2
Message: KHF: Correcting $catalog-resolved-uri value: -%2F
2020-01-10 19:53:03 DEBUG AddAttribute:169 - Running add-attribute !1.22.2.5.3
2020-01-10 19:53:03 DEBUG AddAttribute:169 - Running add-attribute !1.22.2.5.4
2020-01-10 19:53:03 DEBUG XSLT:169 - Running xslt unescape
2020-01-10 19:53:03 INFO  Catalog:338 - Resolved: http://transpect.io/xslt-util/hex/xsl/hex.xsl
2020-01-10 19:53:04 DEBUG AddAttribute:169 - Running add-attribute lastpath
2020-01-10 19:53:04 DEBUG AddAttribute:169 - Running add-attribute lastpath-os

What would you recommend to debug this further?

Apologies on the length of this report, but I wanted to add the context I have. My partially working code is here, although I've (clumsily, given my lack of proficiency with Java) tried to do as little as possible to get Calabash and idml2xml-frontend running as possible: https://github.com/codeburl/transpect-on-lambda

gimsieke commented 4 years ago

Hi Keith,

Sorry that getting it to run caused so much trouble so far.

I see that you are using Calabash for Saxon 9.9. There have been some issues with Saxon 9.9. (and base URIs in particular) that I didn’t manage to look into for some time now.

On the other hand idml2xml-frontend is configured to use a calabash-frontend branch as a submodule, and it seems to use a Saxon 9.9 Calabash, so I guess that it should be working in principle.

I see that the steps can be located using catalog resolution.

I wonder why $catalog-resolved-uri has the value %2F though. A catalog resolved URI should at least be a URI, starting with file:. Can you uncomment the message that outputs the attributes of the pos:info result and tell me what it reports?

gimsieke commented 4 years ago

Maybe there is no cwd on Lambda, that’s what I suspect or at least factor in as a possible cause. Since I didn’t anticipate that, unpredictable results may ensue.

abdelazer commented 4 years ago

Thank you for the suggestions. I've uncommented the message you suggested and included the output logs here (it does seem to have a cwd):

2020-01-10T23:59:31.995000 START RequestId: Version: $LATEST
2020-01-10T23:59:31.999000 2020-01-10 23:59:31 INFO  IDMLHandler:32 - Handling S3 PUT to source-idml-bucket-y/x.idml
2020-01-10 23:59:32 INFO  IDMLHandler:66 - Downloading IDML file from S3 (arn:aws:s3:::source-idml-bucket-y/x.idml) to /tmp/input.idml
2020-01-10 23:59:32 INFO  IDMLHandler:48 - Invoking XProc pipline (/var/task/xpl/idml2xml-frontend.xpl) with config /var/task/transpect-config.xml
2020-01-10 23:59:32 DEBUG XProcConfiguration:426 - XML Calabash searching for Saxon extension functions on the class path
2020-01-10 23:59:32 DEBUG XProcRuntime:170 - XML Calabash version 1.1.30 (for Saxon 9.9.1.5/HE)
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/idml2xml/xpl/idml2hub.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/calabash-extensions/unzip-extension/unzip-declaration.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/file-uri/xpl/file-uri.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/recursive-directory-list/xpl/recursive-directory-list.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/file-uri/xpl/file-uri.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xslt-mode/xpl/xslt-mode.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-xml-model.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xslt-mode/xpl/xslt-mode.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-xml-model.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-hub-xml-model.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/calabash-extensions/mathtype-extension/xpl/mathtype2mml-declaration.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 23:59:33 DEBUG XSLT:169 - Running xslt !1.2.7.3.3
2020-01-10 23:59:33 DEBUG Store:169 - Running store !1.2.7.3.4: href=/tmp/CB-7a076f5d-ab17-457a-8f43-d27a7aadf219_idml2hub-start.txt
2020-01-10 23:59:33 DEBUG Info:169 - Running {http://exproc.org/proposed/steps/os}info info
2020-01-10 23:59:33 DEBUG Message:169 - Running {http://xmlcalabash.com/ns/extensions}message !1.16
2020-01-10T23:59:33.693000 Message: pos:info file-separator=/ path-separator=: os-architecture=amd64 os-name=Linux os-version=4.14.138-99.102.amzn2.x86_64 cwd=/ user-name=sbx_user1051 user-home=/home/sbx_user1051 
2020-01-10 23:59:33 DEBUG AddAttribute:169 - Running add-attribute empty-result
2020-01-10 23:59:33 DEBUG XSLT:169 - Running xslt catalog-resolve
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xslt-util/resolve-uri/xsl/resolve-uri.xsl
2020-01-10 23:59:33 DEBUG Info:169 - Running {http://exproc.org/proposed/steps/os}info info
2020-01-10 23:59:33 DEBUG Message:169 - Running {http://xmlcalabash.com/ns/extensions}message !1.16
2020-01-10T23:59:33.914000 Message: pos:info file-separator=/ path-separator=: os-architecture=amd64 os-name=Linux os-version=4.14.138-99.102.amzn2.x86_64 cwd=/ user-name=sbx_user1051 user-home=/home/sbx_user1051 
2020-01-10 23:59:33 DEBUG AddAttribute:169 - Running add-attribute empty-result
2020-01-10 23:59:33 DEBUG XSLT:169 - Running xslt catalog-resolve
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xslt-util/resolve-uri/xsl/resolve-uri.xsl
2020-01-10 23:59:34 DEBUG Identity:169 - Running identity !1.20.1.2
2020-01-10 23:59:34 DEBUG Sink:169 - Running sink !1.18
2020-01-10 23:59:34 DEBUG Sink:169 - Running sink sink1
2020-01-10 23:59:34 DEBUG SetAttributes:169 - Running set-attributes add-orig-href
2020-01-10 23:59:34 ERROR Main:140 - Pipeline failed: An empty sequence is not allowed as the second argument of fn:resolve-uri()
2020-01-10 23:59:34 ERROR Main:143 - Underlying exception: An empty sequence is not allowed as the second argument of fn:resolve-uri()
2020-01-10 23:59:34 DEBUG Main:145 - An empty sequence is not allowed as the second argument of fn:resolve-uri()
2020-01-10T23:59:34.174000 net.sf.saxon.s9api.SaxonApiUncheckedException: An empty sequence is not allowed as the second argument of fn:resolve-uri()
gimsieke commented 4 years ago

Apparently tr:file-uri doesn’t like '/' as the filename option value. I’m going to fix it.

gimsieke commented 4 years ago

I committed something. Can you pull xproc-util and xslt-util manually and see whether it will improve things?

abdelazer commented 4 years ago

The issue is resolved after updating those three files you modified in xproc-util and xslt-util yesterday and I'm now getting Hub XML out of Lambda. Thanks!

I appreciate you jumping on this issue so quickly.

gimsieke commented 4 years ago

I appreciate you jumping on this issue so quickly.

Despite a cold and an imminent wedding party, I might add!

Filename to URI conversion will be a standard function of XProc 3.0.

We will eventually migrate all transpect libraries to XProc 3.0, keeping the current modules in maintenance mode. There will be a migration tool for 1.0 pipelines.

If you use non-XProc tools to postprocess the flat, DocBook-based XML that comes out of idml2xml, you won’t need to migrate that many pipelines though. You just wait until idml2xml-frontend gets migrated.