simonetallevi / maven-replacer-plugin

Automatically exported from code.google.com/p/maven-replacer-plugin
MIT License
0 stars 0 forks source link

XPath Support for precise Replacments in XML Files #58

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'm using the maven-replacer-plugin to maintain quite a big build 
infrastructure with 3 parallel development branches containing 300+ maven 
modules each. The plugin helps me to adjust all the configuration files in my 
setup.
Unfortunately, many of the files that I have to modify are XML files, which are 
sometimes quite tricky to handle. Imagine a POM file in which you want to 
replace all plugin versions but not the dependency versions. The replacer 
plugin does currently only allow to pick out single lines for replacements. 
Having similar lines in different contexts, such as the <version> tag in plugin 
and dependency declarations of a POM file, makes it difficult to find the right 
lines to replace.

In order to get better means to make precise replacements in specific parts of 
XML files, I suggest to add XPath support to the plugin. The attached patch 
contains a working proof of concept for this functionality. It enhances the 
plugin configuration with the definition of an optional <xpath> tag in the 
<replacement> configuration. In case an XPath is provided, the plugin will 
treat the file as XML and it will apply the replacements only to the parts that 
match the specified XPath.

Example
-------

Think of an XML file containing some data sets about people:
<people>
  <person>
    <firstname>Arthur</firstname>
    <lastname>Dent</lastname>
    <occupation>BBC Radio Employee</occupation>
  </person>
  <!-- many other persons -->
  <person>
    ...
  </person>
</people>

Replacing the <firstname> of "Arthur Dent" would be difficult in case the XML 
contains more persons named "Arthur". However, with XPath we can select the 
correct person based on additional information in the XML file, e.g.
"/people/person[firstname='Arthur' and lastname='Dent']"

This plugin configuration will only replace the <firstname> of Arthur Dent 
without touching the other persons in the XML file even if their <firstname> is 
also Arthur:
<configuration>
  <file>target/test-classes/people.xml</file>
  <replacements>
    <!-- Arthur Dent's middle name is Philip -->
    <replacement>
      <xpath>/people/person[firstname='Arthur' and lastname='Dent']</xpath>
      <token>(Arthur)</token>
      <value>$1 Philip</value>
    </replacement>
  </replacements>
</configuration>

Besides the XPath patch, I did also attach a test project that shows some 
example XPath replacements (including the replacement above) on an XML file in 
target/test-classes (copied from src/test/resources). Should you want to 
include the suggested XPath support into the plugin, I'd be glad to improve my 
current alpha-implementation and contribute it to the replacer plugin.

Original issue reported on code.google.com by st.fer...@gmail.com on 9 Nov 2011 at 9:17

Attachments:

GoogleCodeExporter commented 9 years ago
Great stuff! I will have a look soon and consider including xpath support.

Original comment by baker.st...@gmail.com on 15 Nov 2011 at 5:23

GoogleCodeExporter commented 9 years ago
I have started development with the help of your patch.
Thanks a lot for the patch and especially for the test data.

Original comment by baker.st...@gmail.com on 15 Nov 2011 at 10:40

GoogleCodeExporter commented 9 years ago
Great! If I should help, just let me know.
Regarding the test data, the MSDN XPath reference provides a sample XML file 
with a few more features than my example: 
http://msdn.microsoft.com/en-us/library/ms256095.aspx

Original comment by st.fer...@gmail.com on 15 Nov 2011 at 10:46

GoogleCodeExporter commented 9 years ago
I am still working on it and I have almost finished.

Things I have noticed are that the tokenValueMap will not easily support this 
feature, but that's okay.

However, the replaced XML format can be changed significantly from it's 
original state including adding the XML format header even if it isn't present 
beforehand. I am not sure if this is a problem or not and am considering adding 
logic to not add the header if it was not originally present.

Original comment by baker.st...@gmail.com on 16 Nov 2011 at 3:35

GoogleCodeExporter commented 9 years ago
I had the same problems when I used JAXP and its javax.xml.transform.* classes. 
With that API I could neither control the XML prolog nor the format when 
writing the DOM back to the file. XMLSerializer of Apache Xerces (which 
unfortunately has been deprecated in the latest version of the library) seemed 
to do a much better job. However, I didn't test it extensively. But it seemed 
to preserve the original XML prolog and the indentation of the file.

Original comment by st.fer...@gmail.com on 16 Nov 2011 at 7:41

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I have finished the coding for this feature. It should follow the same usage 
configuration you have provided.

It would be fantastic if you were to test it before I do a release by checking 
out the latest code from trunk and installing the plugin locally.

Thanks again for your help with this feature and I look forward to your 
feedback.

Steven

Original comment by baker.st...@gmail.com on 16 Nov 2011 at 9:49

GoogleCodeExporter commented 9 years ago
I will try it during the next week. Thanks a lot for the integration!

-Stefan

Original comment by st.fer...@gmail.com on 17 Nov 2011 at 9:45

GoogleCodeExporter commented 9 years ago
Hi Steven

I could do some tests of the XPath integration on my project. Generally, it 
worked pretty well but I found a few issues which I could solve with some minor 
modifications in the code (patch is attached):

1) The XPath replacements did not work on XML attributes. For example, this 
causes an exception:
   XML:
   <root id="ToBeReplaced" class="root">
     <element id="ID" class="element">
       some text
     </element>
   </root>

   XPath: /root/@id
   Token/Value: ToBeReplaced/MyValue

2) Encoding problems in XPathReplacer.writeXml():
   The XMLSerializer writes to a ByteArrayOutputStream by using UTF-8 (default if the OutputFormat object is not configured with an encoding). After serialization the byte array is converted into a String (bos.toString()) using the platform encoding.

To solve problem 1) I needed to add special handling for XML attributes. I 
added a unit test for this fix as well. For 2) I replaced the 
ByteArrayOutputStream with a StringWriter and configured the OutputFormat 
object with the encoding of the XML document. This fix, however, leads to a 
slightly different output of the XML prolog which broke some of your unit 
tests. In case the source XML does not define an encoding, the resulting XML 
will not define it as well. I fixed the unit tests by adding an XML prolog with 
encoding to the xpath.xml test file.
Besides my fixes, the patch contains a few additional adjustments in the unit 
tests. I had to make these modifications because the XML parser produces 
localized error messages, which did not match the unit test criteria on my 
non-English environment.

One thing I couldn't solve is the layout of XML attributes when writing the XML 
back to the file. The XMLSerializer will order all attributes alphabetically 
and put them on one line. This XML, for example...
<root
  attr3="foo"
  attr2="bar"
  attr1="baz">
  <element>some text</element>
</root>

...will end up in this layout after being processed:
<root attr1="baz" attr2="bar" attr3="foo">
  <element>some text</element>
</root>

I can live with this behaviour, but it's not nice, especially when XML elements 
contain many attributes with long values.

-Stefan

Original comment by st.fer...@gmail.com on 21 Nov 2011 at 11:34

Attachments:

GoogleCodeExporter commented 9 years ago
Stefan,

I have applied all your fixes and will be performing a release tonight.

Thanks again for your help,
Steven

Original comment by baker.st...@gmail.com on 21 Nov 2011 at 11:49

GoogleCodeExporter commented 9 years ago
I have finishing releasing 1.4.0. 
It usually takes a few hours to be promoted to maven central.

Original comment by baker.st...@gmail.com on 22 Nov 2011 at 9:52

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Just issue clean up.

Original comment by baker.st...@gmail.com on 17 Sep 2012 at 12:44