linkedpipes / etl

LinkedPipes ETL is an RDF based, lightweight ETL tool
https://etl.linkedpipes.com
Other
142 stars 30 forks source link

XSLT transformer Error - Prolog #801

Open versant2612 opened 4 years ago

versant2612 commented 4 years ago

I have two pipelines with the same error. Both uses the same input xml file but different xsl.

I've already used this XSLT files with xsltproc command line and it worked.

Component: XSLT transformer

Status: Failed

Start: 2020-03-11 18:25:28

End: 2020-03-11 18:25:28

Duration: 00:00:00

Cause: PipelineComponent execution failed.

Root cause: SAXParseException : Content is not allowed in prolog.

Messages:

Progress 0 / 0 2020-03-11 18:25:28

jakubklimek commented 4 years ago

Again, without the pipeline, I am only guessing here. This could mean that your XSLT stylesheet in the configuration of the XSLT transformer is not a valid XML file. See the sample pipeline in the XSLT transformer documentation for an example.

versant2612 commented 4 years ago

image

Here is the pipeline. The xsl file I got from https://sourceforge.net/projects/xmltordf/files/xml2rdf3.xsl/download

jakubklimek commented 4 years ago

@versant2612 can you share the pipeline itself? (download its definition and share it)

versant2612 commented 4 years ago

@jakubklimek

Sorry for the delay, I had some problems accessing my server

XML to RDF com xml2rdf3.xsl.zip

jakubklimek commented 4 years ago

@versant2612 OK, you have entered a file path to the XSLT template. However, the component expects the actual template (the content) pasted here.

versant2612 commented 4 years ago

I've copied and pasted the content. Now the pipeline executes with a warning and don't generate the rdf. I have another XSLT and a second pipeline with the same steps and the same data input. This one executes without warning but don't generate the rdf too. Is there any other missing component in my pipeline? Do you want the input XML file?

WARNING cloud-di@vmm-template:~/etl/deploy$ Warning: on line 22 The attribute axis starting at a document node will never select anything

OUTUPUT cloud-di@vmm-template:~/etl/deploy$ more /home/cloud-di/lattes-data/professores.rdf <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> </rdf:RDF>

CONTENT

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xs="http://www.w3.org/TR/2008/REC-xml-20081126#">

   <xsl:strip-space elements="*"/>
   <xsl:output method="xml" indent="yes"/>

   <xsl:param name="BaseURI"><!-- Define if not completed by application-->
          <!-- http://www.exampleURI.net/STELLA-I -->
   </xsl:param>

   <!-- Begin RDF document -->
   <xsl:template match="/">
          <xsl:element name="rdf:RDF">
                 <rdf:Description>
                        <xsl:attribute name="rdf:about"/>
                        <xsl:apply-templates select="/*|/@*"/>
                 </rdf:Description>
          </xsl:element>
   </xsl:template>

   <!-- Turn XML elements into RDF triples. -->
   <xsl:template match="*">
          <xsl:param name="subjectname"/>

          <!-- Build URI for subjects resources from acestors elements -->
          <xsl:variable name="newsubjectname">
                 <xsl:if test="$subjectname=''">
                        <xsl:value-of select="$BaseURI"/>
                        <xsl:text>#</xsl:text>
                 </xsl:if>
                 <xsl:value-of select="$subjectname"/>
                 <xsl:value-of select="name()"/>
                 <!-- Add an ID to sibling element of identical name -->
                 <xsl:variable name="number">
                        <xsl:number/>
                 </xsl:variable>
                 <xsl:if test="$number > 1">
                        <xsl:text>_</xsl:text>
                        <xsl:number/>
                 </xsl:if>
          </xsl:variable>

          <xsl:element name="{name()}" namespace="{concat(namespace-uri(),'#')}">
                 <rdf:Description>
                        <xsl:attribute name="rdf:about">
                               <xsl:value-of select="$newsubjectname"/>
                        </xsl:attribute>
                        <xsl:apply-templates select="@*|node()">
                               <xsl:with-param name="subjectname"
                                      select="concat($newsubjectname,'/')"/>
                        </xsl:apply-templates>
                 </rdf:Description>
          </xsl:element>

          <!-- rdf:_no triple to preserve the order of elements,
               comment out if not needed -->
          <xsl:if test="count(../*) >1">
                 <xsl:element name="{concat('rdf:_',count(preceding-sibling::*)+1)}">
                        <rdf:Description>
                               <xsl:attribute name="rdf:about">
                                      <xsl:value-of select="$newsubjectname"/>
                               </xsl:attribute>
                        </rdf:Description>
                 </xsl:element>
          </xsl:if>
   </xsl:template>

   <!-- Create attribute triples. -->
   <xsl:template match="@*" name="attributes">
          <xsl:variable name="ns">
                 <!-- If attribute doesn't have a namespace use element namespace -->
                 <xsl:choose>
                        <xsl:when test="namespace-uri()=''">
                               <xsl:value-of select="concat(namespace-uri(..),'#')"/>
                        </xsl:when>
                        <xsl:otherwise>
                               <xsl:value-of select="concat(namespace-uri(),'#')"/>
                        </xsl:otherwise>
                 </xsl:choose>
          </xsl:variable>
          <xsl:element name="{name()}" namespace="{$ns}">
                 <xsl:value-of select="."/>
          </xsl:element>
   </xsl:template>

   <!-- Enclose text in an rdf:value element -->
   <xsl:template match="text()">
          <xsl:element name="rdf:value">
                 <xsl:value-of select="."/>
          </xsl:element>
   </xsl:template>

   <!-- Add triple to preserve comments -->
   <xsl:template match="comment()">
          <xsl:element name="xs:comment">
                 <xsl:value-of select="."/>
          </xsl:element>
   </xsl:template>

</xsl:stylesheet>

jakubklimek commented 4 years ago

@versant2612 First of all, does the XSLT generate the RDF in another XSLT processor? Which one? If yes, then if you can, please provide the input file (or a relevant part of it).

versant2612 commented 4 years ago

Yes. The command line is xsltproc --stringparam BaseURI http://lattes.cnpq.br/8164403687403639# xml2rdf3.xsl Lattes8164403687403639.xml > Lattes8164403687403639.rdf3

versant2612 commented 4 years ago

LinkedPipeXML.zip

versant2612 commented 4 years ago

The rdf file generated by xml2rdf3.xsl was successfully imported in a repository through AllegroGraph WebView 6.4.2

image

jakubklimek commented 4 years ago

See this execution. It contains your XSLT and your input XML and produces the same file you have on the output - no configuration necessary.

I noticed that you set the BaseURI XSLT param in the command-line. The same can be done by generating runtime configuration for the XSLT component, like at the bottom of the documentation.