Open GoogleCodeExporter opened 9 years ago
JNDiff has a n-way diff and merge algorithm ... to check if it suits the above
scenario
http://jndiff.sourceforge.net
Original comment by ashok.ha...@gmail.com
on 14 Apr 2010 at 8:54
demo for a licensed product --
http://opendocument.deltaxml.com/free/demo/odt/merge/four-editors/
Original comment by ashok.ha...@gmail.com
on 20 Apr 2010 at 10:49
Also el4j
http://sourceforge.net/projects/el4j/files/
http://www.javaworld.com/javaworld/jw-07-2007/jw-07-xmlmerge.html
Also using StaX
http://stax.codehaus.org/Home
http://www.devx.com/ibm/Article/20269
Original comment by ashok.ha...@gmail.com
on 20 Apr 2010 at 11:04
merge process has 2 parts
1) safeguard - check for overlapping changes - dont merge, present error to user
2) n-way merge of odf documents to present the final merged odf
Original comment by ashok.ha...@gmail.com
on 20 Apr 2010 at 11:55
comparison of currently available xml diff mechanisms
http://www.scribd.com/doc/14482474/XML-diff-survey
Original comment by ashok.ha...@gmail.com
on 20 Apr 2010 at 9:28
TO DO :
test the google diff-match-patch library with odf track changes
http://code.google.com/p/google-diff-match-patch/
Original comment by ashok.ha...@gmail.com
on 21 Apr 2010 at 9:49
Also test diffxml :
http://sourceforge.net/projects/diffxml/files/diffxml/
Original comment by ashok.ha...@gmail.com
on 21 Apr 2010 at 9:49
Original comment by ashok.ha...@gmail.com
on 4 May 2010 at 11:29
We use a hybrid mechanism of xml parsing and recording xml fragments in a db to
do the merge of n xml documents into 1 document.
the use of the db reduces the memory requirements for inmemory processing of
xml.
the basic logic of the merge works as follows --
the primary assumptions
-- the odf header and the odf content body are merged independently. This is
because a track change mark in ODF adds header entries to the the
<text:track-changes> container in ht e ODF content header.
Merging the header and body as one unit would have thus required a node level
merge and synchronization between change entries and header. treating them
independently makes the merge much simpler.
-- there are no overlapping merges. the identification of overlapping merges is
done by an exception handled case of the merge process (i.e. the merge
failed...)
merge process -
- we iterate through the 'n' changed documents. change info is extracted and recorded in a db. node address for each change is recorded, and the order of the change is also recorded (1, 2, 3 ...)
for the content body --
- change nodes are processed for the 'n' documents starting with the lowest order number
-- the node addresses of the change node are compared to identify the shallowest one i.e. which is the first with respect to the original document [1]
-- the shallowest node[1] is handled first and all the preceding:: nodes to the shallowest node are captured and streamed into a xml document [2]
-- the shallowest node itself is streamed into the incremental xml document [2]
-- the next shallowest node[3] of the 'n' document is handled next and the same process is repeated, except that only the preceding:: nodes upto the end of the [1] node are streamed into the xml document.
for the content header --
- the content header is a simpler header-detail xml merge scenario. a ready made tool like diffxml will be used for this.
Original comment by ashok.ha...@gmail.com
on 6 May 2010 at 8:47
To compare node order :
[Compare node order
http://code.google.com/p/doctype/wiki/ArticleNodeCompareDocumentOrder]
Original comment by ashok.ha...@gmail.com
on 6 May 2010 at 1:40
Node order is compared using compareDocumentPosition()
Original comment by ashok.ha...@gmail.com
on 6 May 2010 at 1:53
<text:change-start> <text:change-end> can encompass whole sections and tables.
we need to collpase an insert change extract it out temporarily and replace it
with a
marker, process the merge and then replace the marker back with the extracted
text (xml).
Original comment by ashok.ha...@gmail.com
on 6 May 2010 at 3:21
A more efficient approach is to group changes by the parent node containing the
change.
Since the parent nodes always exist in the parent document -- the parent node
groupings can be ordered by
using compare document position in the original document. Node change
processing can then be localized to
within the parent node groupings.
Original comment by listmans...@gmail.com
on 9 May 2010 at 7:00
Tested for 2 way insert merge.
Inserts add sections to different parts of the document.
Preceding, following incrementatl document change is captured in the db and on
the
file system.
To Do :
-------
- Build merged document from extracted parts
- Test more complex insert scenarios
- Add logic for delete scenarios
- Check for overlaps
Original comment by ashok.ha...@gmail.com
on 12 May 2010 at 4:01
Setting milestone current issues
Original comment by ashok.ha...@gmail.com
on 14 May 2010 at 11:20
Original issue reported on code.google.com by
ashok.ha...@gmail.com
on 14 Apr 2010 at 8:51