ropensci / xslt

Extension of xml2 package for xsl transformations
https://docs.ropensci.org/xslt
28 stars 1 forks source link

Unable to run second transformation without saving first result to disk #12

Closed ParfaitG closed 9 months ago

ParfaitG commented 9 months ago

To transform a KML file for data frame build, I attempted back-to-back calls of xml_xslt() which does not yield correct result and does not raise any error. Oddly, only the root node outputs.

However, saving first transformation to disk with xml2::write_xml followed by xml2::read_xml and then run a second xml_xslt does output the correct, desired result. See below reproducible example with source files.

Can issue involve the default KML namespace handling? Outputs of read_xml and xml_xslt both return xml_document types. My XSLT 1.0 scripts are fully compliant, validated with Linux's xsltproc and with online fiddle.

R (see differences in final_doc)

library(xml2)
library(xslt)

# DOWNLOAD KML FILE
tmp <- tempfile()
download.file(
  paste0(
    "https://data.cityofchicago.org/download/rytz-fq6y/",
    "application%2Fvnd.google-earth.kmz"
  ),
  destfile = tmp,
  mode = "wb"
)
unzip(tmp, files = "doc.kml")
unlink(tmp)
# trying URL 'https://data.cityofchicago.org/download/rytz-fq6y/application%2Fvnd.google-earth.kmz'
# downloaded 648 KB

# READ XML AND XSLT
doc <- read_xml("doc.kml", package = "xslt")
style1 <- read_xml("style1.xsl", package = "xslt")
style2 <- read_xml("style2.xsl", package = "xslt")

# BACK TO BACK TRANSFORMATIONS
new_doc <- xml_xslt(doc, style1)
final_doc <- xml_xslt(new_doc, style2)

final_doc
# {xml_document}
# <kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

# WRITE / READ XML
write_xml(new_doc, "new_doc.xml")
doc2 <- read_xml("new_doc.xml", package = "xslt")

# RUN SECOND TRANSFORMATION
final_doc <- xml_xslt(doc2, style2)

final_doc
# {xml_document}
# <kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
#  [1] <DATA>\n  <ROUTE>1</ROUTE>\n  <ROUTE0>001</ROUTE0>\n  <NAME>BRONZEVILLE/UNION STATION</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>0</SAT>\n  <SUN>0</SUN>\n  <SHAPE.LEN>34690.953676</ ...
#  [2] <DATA>\n  <ROUTE>2</ROUTE>\n  <ROUTE0>002</ROUTE0>\n  <NAME>HYDE PARK EXPRESS</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>0</SAT>\n  <SUN>0</SUN>\n  <SHAPE.LEN>110607.498776</SHAPE.L ...
#  [3] <DATA>\n  <ROUTE>3</ROUTE>\n  <ROUTE0>003</ROUTE0>\n  <NAME>KING DRIVE</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>88297.447622</SHAPE.LEN>\n</D ...
#  [4] <DATA>\n  <ROUTE>4</ROUTE>\n  <ROUTE0>004</ROUTE0>\n  <NAME>COTTAGE GROVE</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>106219.449701</SHAPE.LEN>\ ...
#  [5] <DATA>\n  <ROUTE>5</ROUTE>\n  <ROUTE0>005</ROUTE0>\n  <NAME>SOUTH SHORE NIGHT BUS</NAME>\n  <WKDAY>0</WKDAY>\n  <SAT>0</SAT>\n  <SUN>0</SUN>\n  <SHAPE.LEN>67048.136707</SHAP ...
#  [6] <DATA>\n  <ROUTE>6</ROUTE>\n  <ROUTE0>006</ROUTE0>\n  <NAME>JACKSON PARK EXPRESS</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>99573.511134</SHAPE ...
#  [7] <DATA>\n  <ROUTE>7</ROUTE>\n  <ROUTE0>007</ROUTE0>\n  <NAME>HARRISON</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>0</SAT>\n  <SUN>0</SUN>\n  <SHAPE.LEN>67830.691765</SHAPE.LEN>\n</DATA>
#  [8] <DATA>\n  <ROUTE>8</ROUTE>\n  <ROUTE0>008</ROUTE0>\n  <NAME>HALSTED</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>77133.131835</SHAPE.LEN>\n</DATA>
#  [9] <DATA>\n  <ROUTE>8A</ROUTE>\n  <ROUTE0>008A</ROUTE0>\n  <NAME>SOUTH HALSTED</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>66546.30805</SHAPE.LEN>\ ...
# [10] <DATA>\n  <ROUTE>9</ROUTE>\n  <ROUTE0>009</ROUTE0>\n  <NAME>ASHLAND</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>100141.31991</SHAPE.LEN>\n</DATA>
# [11] <DATA>\n  <ROUTE>X9</ROUTE>\n  <ROUTE0>009X</ROUTE0>\n  <NAME>ASHLAND EXPRESS</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>0</SAT>\n  <SUN>0</SUN>\n  <SHAPE.LEN>96064.210775</SHAPE.LE ...
# [12] <DATA>\n  <ROUTE>10</ROUTE>\n  <ROUTE0>010</ROUTE0>\n  <NAME>MUSEUM OF SCIENCE &amp; INDUSTRY</NAME>\n  <WKDAY>0</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>79579. ...
# [13] <DATA>\n  <ROUTE>11</ROUTE>\n  <ROUTE0>011</ROUTE0>\n  <NAME>LINCOLN</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>24694.573889</SHAPE.LEN>\n</DATA>
# [14] <DATA>\n  <ROUTE>12</ROUTE>\n  <ROUTE0>012</ROUTE0>\n  <NAME>ROOSEVELT</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>62586.111408</SHAPE.LEN>\n</D ...
# [15] <DATA>\n  <ROUTE>15</ROUTE>\n  <ROUTE0>015</ROUTE0>\n  <NAME>JEFFERY LOCAL</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>63343.258492</SHAPE.LEN>\ ...
# [16] <DATA>\n  <ROUTE>18</ROUTE>\n  <ROUTE0>018</ROUTE0>\n  <NAME>16TH/18TH</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>48076.975753</SHAPE.LEN>\n</D ...
# [17] <DATA>\n  <ROUTE>19</ROUTE>\n  <ROUTE0>019</ROUTE0>\n  <NAME>UNITED CENTER EXPRESS</NAME>\n  <WKDAY>0</WKDAY>\n  <SAT>0</SAT>\n  <SUN>0</SUN>\n  <SHAPE.LEN>32037.427404</SHA ...
# [18] <DATA>\n  <ROUTE>20</ROUTE>\n  <ROUTE0>020</ROUTE0>\n  <NAME>MADISON</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>54335.296018</SHAPE.LEN>\n</DATA>
# [19] <DATA>\n  <ROUTE>21</ROUTE>\n  <ROUTE0>021</ROUTE0>\n  <NAME>CERMAK</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>85014.706492</SHAPE.LEN>\n</DATA>
# [20] <DATA>\n  <ROUTE>22</ROUTE>\n  <ROUTE0>022</ROUTE0>\n  <NAME>CLARK</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>73447.885542</SHAPE.LEN>\n</DATA>
# ...

Sources

KML

Chicago Transit Authority: CTA - Bus Routes KML

XSLT 1

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:k="http://www.opengis.net/kml/2.2">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/k:kml">
     <xsl:copy>
       <xsl:apply-templates select="descendant::k:description"/>
     </xsl:copy>
    </xsl:template>

    <xsl:template match="k:description">
     <xsl:copy>
       <xsl:value-of select="substring-before(substring-after(., '&lt;/head&gt;'), '&lt;/html&gt;')" disable-output-escaping="yes"/>
     </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

XSLT 2

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:k="http://www.opengis.net/kml/2.2">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/k:kml">
     <xsl:copy>
       <xsl:apply-templates select="k:description">
         <xsl:sort select="descendant::k:td[text()='ROUTE0']/following-sibling::k:td"/>
       </xsl:apply-templates>
     </xsl:copy>
    </xsl:template>

    <xsl:template match="k:description">
       <xsl:apply-templates select="descendant::k:table[2]"/>
    </xsl:template>

    <xsl:template match="k:table">
     <xsl:element name="DATA" namespace="http://www.opengis.net/kml/2.2">
       <xsl:apply-templates select="k:tr"/>
     </xsl:element>
    </xsl:template>

    <xsl:template match="k:tr">
      <xsl:element name="{k:td[1]}" namespace="http://www.opengis.net/kml/2.2">
        <xsl:value-of select="k:td[2]"/>
      </xsl:element>
    </xsl:template>

</xsl:stylesheet>

Session

> sessionInfo()

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xslt_1.4.4 xml2_1.3.6

loaded via a namespace (and not attached):
 [1] compiler_4.2.1  fastmap_1.1.0   cli_3.4.1       htmltools_0.5.3 tools_4.2.1     rstudioapi_0.13 yaml_2.3.6      Rcpp_1.0.9      rmarkdown_2.14  knitr_1.39      xfun_0.31      
[12] digest_0.6.30   rlang_1.1.2     evaluate_0.15 
jeroen commented 9 months ago

I am not sure what is going on here, but your input document seems to have large embedded html <![CDATA[ blobs inside the xml, and then your first xsl is using disable-output-escaping.

As a result your new_doc object contains large blobs of unparsed html text. Therefore xml2 can't apply the transformation, it is just text.

xml2::xml_child(new_doc)
xml2::xml_text( xml2::xml_child(new_doc))

Once you write the html text to disk while disabling escaping, and then read it again, the html actually gets parsed into an xml tree. But I think what you want to do is parse the individual html blobs?

ParfaitG commented 9 months ago

Good point! I thought the CData parsing would be the issue. Since my use case is more complex, my solution requires various conversions. Hence, I can avoid writing to disk by calling read_xml on character conversion of the XSLT result.

# READ XML AND XSLT
doc <- read_xml("doc.kml", package = "xslt")
style1 <- read_xml("style1.xsl", package = "xslt")
style2 <- read_xml("style2.xsl", package = "xslt")

# RUN FIRST TRANSFORMATION
new_doc <- xml_xslt(doc, style1) |> as.character() |> read_xml()
final_doc <- xml_xslt(new_doc, style2)

final_doc
# {xml_document}
# <kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
#  [1] <DATA>\n  <ROUTE>1</ROUTE>\n  <ROUTE0>001</ROUTE0>\n  <NAME>BRONZEVILLE/UNION STATION</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>0</SAT>\n  <SUN>0</SUN>\n  <SHAPE.LEN>34690.953676</ ...
#  [2] <DATA>\n  <ROUTE>2</ROUTE>\n  <ROUTE0>002</ROUTE0>\n  <NAME>HYDE PARK EXPRESS</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>0</SAT>\n  <SUN>0</SUN>\n  <SHAPE.LEN>110607.498776</SHAPE.L ...
#  [3] <DATA>\n  <ROUTE>3</ROUTE>\n  <ROUTE0>003</ROUTE0>\n  <NAME>KING DRIVE</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>88297.447622</SHAPE.LEN>\n</D ...
#  [4] <DATA>\n  <ROUTE>4</ROUTE>\n  <ROUTE0>004</ROUTE0>\n  <NAME>COTTAGE GROVE</NAME>\n  <WKDAY>1</WKDAY>\n  <SAT>1</SAT>\n  <SUN>1</SUN>\n  <SHAPE.LEN>106219.449701</SHAPE.LEN>\ ...
#  [5] <DATA>\n  <ROUTE>5</ROUTE>\n  <ROUTE0>005</ROUTE0>\n  <NAME>SOUTH SHORE NIGHT BUS</NAME>\n  <WKDAY>0</WKDAY>\n  <SAT>0</SAT>\n  <SUN>0</SUN>\n  <SHAPE.LEN>67048.136707</SHAP
...

With a different, simpler example without CData parsing, back to back XSLT transformations work as expected without any conversions:

# READ XML AND XSLT
doc <- read_xml("Input.xml", package = "xslt")
style1 <- read_xml("style1.xsl", package = "xslt")
style2 <- read_xml("style2.xsl", package = "xslt")

# RUN TRANSFORMATIONS
new_doc <- xml_xslt(doc, style1)
final_doc <- xml_xslt(new_doc, style2)

final_doc
# {xml_document}
# <data>
# [1] <aggdata>\n  <industry>Media</industry>\n  <SumOfRevenue>1.90416e+11</SumOfRevenue>\n  <AvgOfAssets>7.84346e+10</AvgOfAssets>\n  <AvgOfEquity>3.06608e+10</AvgOfEquity>\n  <Ma ...
# [2] <aggdata>\n  <industry>Oil &amp; Gas</industry>\n  <SumOfRevenue>7.6821e+11</SumOfRevenue>\n  <AvgOfAssets>1.535778e+11</AvgOfAssets>\n  <AvgOfEquity>8.12524e+10</AvgOfEquity ...
# [3] <aggdata>\n  <industry>Pharmaceuticals</industry>\n  <SumOfRevenue>2.10975e+11</SumOfRevenue>\n  <AvgOfAssets>9.49038e+10</AvgOfAssets>\n  <AvgOfEquity>4.6162e+10</AvgOfEquit ...

XML

<?xml version="1.0" encoding="UTF-8"?>
<data>
    <bigcompany>
        <company>Company OA</company>
        <industry>Oil &amp; Gas</industry>
        <revenue>394105000000</revenue>
        <assets>349493000000</assets>
        <equity>174399000000</equity>
        <netincome>32520000000</netincome>
        <stockprice>89.38</stockprice>
        <employees>75300</employees>
    </bigcompany>
    <bigcompany>
        <company>Company OB</company>
        <industry>Oil &amp; Gas</industry>
        <revenue>200494000000</revenue>
        <assets>266026000000</assets>
        <equity>156191000000</equity>
        <netincome>19241000000</netincome>
        <stockprice>108.62</stockprice>
        <employees>64700</employees>
    </bigcompany>
    <bigcompany>
        <company>Company OC</company>
        <industry>Oil &amp; Gas</industry>
        <revenue>13807000000</revenue>
        <assets>4726000000</assets>
        <equity>16445000000</equity>
        <netincome>2720000000</netincome>
        <stockprice>48.5</stockprice>
        <employees>22000</employees>
    </bigcompany>
    <bigcompany>
        <company>Company OD</company>
        <industry>Oil &amp; Gas</industry>
        <revenue>97800000000</revenue>
        <assets>30500000000</assets>
        <equity>10800000000</equity>
        <netincome>2700000000</netincome>
        <stockprice>27.53</stockprice>
        <employees>45340</employees>
    </bigcompany>
    <bigcompany>
        <company>Company OE</company>
        <industry>Oil &amp; Gas</industry>
        <revenue>62004000000</revenue>
        <assets>117144000000</assets>
        <equity>48427000000</equity>
        <netincome>8428000000</netincome>
        <stockprice>66.66</stockprice>
        <employees>16900</employees>
    </bigcompany>
    <bigcompany>
        <company>Company PA</company>
        <industry>Pharmaceuticals</industry>
        <revenue>49605000000</revenue>
        <assets>169274000000</assets>
        <equity>71622000000</equity>
        <netincome>9135000000</netincome>
        <stockprice>30.14</stockprice>
        <employees>78000</employees>
    </bigcompany>
    <bigcompany>
        <company>Company PB</company>
        <industry>Pharmaceuticals</industry>
        <revenue>48047000000</revenue>
        <assets>105128000000</assets>
        <equity>56943000000</equity>
        <netincome>6272000000</netincome>
        <stockprice>55.43</stockprice>
        <employees>76000</employees>
    </bigcompany>
    <bigcompany>
        <company>Company PC</company>
        <industry>Pharmaceuticals</industry>
        <revenue>74331000000</revenue>
        <assets>131119000000</assets>
        <equity>69752000000</equity>
        <netincome>16323000000</netincome>
        <stockprice>102.31</stockprice>
        <employees>126500</employees>
    </bigcompany>
    <bigcompany>
        <company>Company PD</company>
        <industry>Pharmaceuticals</industry>
        <revenue>23113000000</revenue>
        <assets>35249000000</assets>
        <equity>17641000000</equity>
        <netincome>4685000000</netincome>
        <stockprice>67.2</stockprice>
        <employees>37925</employees>
    </bigcompany>
    <bigcompany>
        <company>Company PE</company>
        <industry>Pharmaceuticals</industry>
        <revenue>15879000000</revenue>
        <assets>33749000000</assets>
        <equity>14852000000</equity>
        <netincome>2004000000</netincome>
        <stockprice>58</stockprice>
        <employees>28000</employees>
    </bigcompany>
    <bigcompany>
        <company>Company MA</company>
        <industry>Media</industry>
        <revenue>48813000000</revenue>
        <assets>84186000000</assets>
        <equity>44958000000</equity>
        <netincome>8004000000</netincome>
        <stockprice>93.65</stockprice>
        <employees>180000</employees>
    </bigcompany>
    <bigcompany>
        <company>Company MB</company>
        <industry>Media</industry>
        <revenue>64657000000</revenue>
        <assets>158813000000</assets>
        <equity>51058000000</equity>
        <netincome>7135000000</netincome>
        <stockprice>57.05</stockprice>
        <employees>139000</employees>
    </bigcompany>
    <bigcompany>
        <company>Company MC</company>
        <industry>Media</industry>
        <revenue>31867000000</revenue>
        <assets>54793000000</assets>
        <equity>17418000000</equity>
        <netincome>4514000000</netincome>
        <stockprice>36.52</stockprice>
        <employees>27000</employees>
    </bigcompany>
    <bigcompany>
        <company>TCompany MD</company>
        <industry>Media</industry>
        <revenue>29795000000</revenue>
        <assets>67994000000</assets>
        <equity>29904000000</equity>
        <netincome>3691000000</netincome>
        <stockprice>84.3</stockprice>
        <employees>26000</employees>
    </bigcompany>
    <bigcompany>
        <company>Company ME</company>
        <industry>Media</industry>
        <revenue>15284000000</revenue>
        <assets>26387000000</assets>
        <equity>9966000000</equity>
        <netincome>1879000000</netincome>
        <stockprice>54.88</stockprice>
        <employees>20915</employees>
    </bigcompany>
</data>

XSLT 1

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>    

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="data">
    <xsl:copy>
      <xsl:apply-templates>        
        <xsl:sort select="industry" order="ascending"/>
        <xsl:sort select="netincome" data-type="number" order="descending"/> 
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

  </xsl:stylesheet>

XSLT 2

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/> 

  <xsl:key name="indkey" match="bigcompany/industry" use="."/>

  <xsl:template match="data">
    <data>
    <xsl:for-each select="bigcompany/industry[generate-id() = generate-id(key('indkey', .)[1])]">  
      <xsl:sort select="." order="ascending"/>                    

        <aggdata>
          <xsl:copy-of select="."/>        
          <SumOfRevenue><xsl:copy-of select="sum(key('indkey', .)/../revenue)"/></SumOfRevenue>
          <AvgOfAssets><xsl:copy-of select="sum(key('indkey', .)/../assets) div count(key('indkey', .)/../assets)"/></AvgOfAssets>
          <AvgOfEquity><xsl:copy-of select="sum(key('indkey', .)/../equity) div count(key('indkey', .)/../equity)"/></AvgOfEquity>
          <MaxOfIncome><xsl:value-of select="key('indkey', .)[1]/../netincome"/></MaxOfIncome>
          <MinOfIncome><xsl:value-of select="key('indkey', .)[5]/../netincome"/></MinOfIncome>
          <AvgOfStockPrice><xsl:copy-of select="sum(key('indkey', .)/../stockprice) div count(key('indkey', .)/../stockprice)"/></AvgOfStockPrice>
          <SumOfEmployees><xsl:copy-of select="sum(key('indkey', .)/../employees)"/></SumOfEmployees>
        </aggdata>

    </xsl:for-each>

    </data>
  </xsl:template>
</xsl:stylesheet>
ParfaitG commented 9 months ago

Though, I do wonder if there is a non-API breaking way to implicitly attempt this XML tree conversion if XSLT targets method as xml and not text and result is a well-formed XML as my first transformation renders? Otherwise, fall back to character or text type? But embedded cdata XML and/or HTML may be edge cases.

And this may be beyond package levels as Python's lxml behaves very similarly to R's xslt, requiring same conversion of string to XML tree. I believe both use similar underlying XSLT engines.

import lxml.etree as lx

# READ XML AND XSLT
doc = lx.parse("doc.kml")
style1 = lx.parse("style1.xsl")
style2 = lx.parse("style2.xsl")

# RUN TRANSFORMATIONS
transformer1 = lx.XSLT(style1)
new_doc = lx.fromstring(str(transformer1(doc)))

transformer2 = lx.XSLT(style2)
final_doc = transformer2(new_doc)

print(final_doc)
# <?xml version="1.0"?>
# <kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
#   <DATA>
#     <ROUTE>1</ROUTE>
#     <ROUTE0>001</ROUTE0>
#     <NAME>BRONZEVILLE/UNION STATION</NAME>
#     <WKDAY>1</WKDAY>
#     <SAT>0</SAT>
#     <SUN>0</SUN>
#     <SHAPE.LEN>34690.953676</SHAPE.LEN>
#   </DATA>
#   <DATA>
#     <ROUTE>2</ROUTE>
#     <ROUTE0>002</ROUTE0>
#     <NAME>HYDE PARK EXPRESS</NAME>
#     <WKDAY>1</WKDAY>
#     <SAT>0</SAT>
#     <SUN>0</SUN>
#     <SHAPE.LEN>110607.498776</SHAPE.LEN>
#   </DATA>
#   <DATA>
#     <ROUTE>3</ROUTE>
#     <ROUTE0>003</ROUTE0>
#     <NAME>KING DRIVE</NAME>
#     <WKDAY>1</WKDAY>
#     <SAT>1</SAT>
#     <SUN>1</SUN>
#     <SHAPE.LEN>88297.447622</SHAPE.LEN>
#   </DATA>
#   ...