Closed GoogleCodeExporter closed 8 years ago
Hi Carsten,
I've applied the patch but cannot see any improvements over the pre-patching
output. Perhaps I had the wrong expectations about the outcome can you please
advice? I've attached the two sources i'm using for comparison and the
resulting output.
Original comment by hle.mich...@gmail.com
on 19 Nov 2010 at 10:02
Attachments:
Just to clarify the problem with the output on the previous comment. The diff
for the Manufacturer and Description sections are scrambled together.
Original comment by hle.mich...@gmail.com
on 19 Nov 2010 at 10:14
Hi Michael,
sorry for coming back to you so late. I just checked your example and you are
right, this scenario is not covered by my patch. Daisydiff doesn't know or care
about sections or chapters like e.g. "Manufacturers" from your snippets.
Daisydiff only sees two long list of text elements and tries to find and report
as few differences in them as possible.
Daisydiff would need to learn about those chapters (h1..h6 for a start) and
perform the differencing separately for the text nodes of those chapters.
I.e. DomTreeBuilder should not return a flat list of elements but some kind of
tree of chapters, each with their own list of text nodes.
Ideally you would then create one TextNodeComparator per chapter and perform
the diffing chapter-wise. But this needs some work, because this would lead to
multiple outputs (you want to have just one single result of course).
Cheers,
Carsten
Original comment by pfeif...@kde.org
on 14 Mar 2011 at 4:14
[deleted comment]
Carsten,
Can you post an example or two, where one can see clearly what has changed
before and after applying this patch? This should help people understand the
changes and decided if the patch should be accepted or not.
Original comment by kkape...@gmail.com
on 23 May 2011 at 2:40
I applied the patch (which was tricky because it clashed with the performance
fixes in 1.2) and I note that it fixes
http://code.google.com/p/daisydiff/issues/detail?id=26 for me, without breaking
anything else I could think of. Awesome work Carsten.
Original comment by don.jp.w...@gmail.com
on 15 Jun 2011 at 5:29
Can you create then a new patch (that is based on 1.2) and attach it here?
Then I will apply this to trunk if you think that it should be part of
DaisyDiff.
Original comment by kkape...@gmail.com
on 15 Jun 2011 at 6:41
Yes. Will do. I have a little more testing to do first.
Original comment by don.jp.w...@gmail.com
on 15 Jun 2011 at 6:54
Here's a simple testcase that my patch fixes. Note that the second table cell
in the old.html file ("1XXX") must start with the same character with which the
previous (first) cell ends ("1").
In two-way diff mode, daisydiff creates a new bogus column in the table.
In three-way diff mode it even puts the red removal marker into the first
column, AFAIR.
Original comment by pfeif...@kde.org
on 16 Jun 2011 at 10:49
Attachments:
I will have no time for this in the coming weeks, so I took the time today,
rebased the patch to trunk and committed it. I'm using it internally for almost
a year and had no problems with it.
(There are more problems left, but unrelated to this issue).
Original comment by pfeif...@kde.org
on 17 Jun 2011 at 3:57
The SeparatingNode is missing a hashCode method. Patch provided.
Original comment by don.jp.w...@gmail.com
on 24 Jun 2011 at 7:22
Attachments:
While the SeparatingNode seems to really improve the handling of tables, I'm
not convinced that the check for paretn position in TagNode.isEquals is a good
change. It leads to a lot of cells claiming to be moved when really they've
just had a row inserted before them.
Eg I find that removing that section improves things. I'm attaching a patch
that comments out the part I'm talking about so that people can discuss whether
they like it or not.
Original comment by don.jp.w...@gmail.com
on 24 Jun 2011 at 7:34
Attachments:
Thanks Don. I committed your patch until I find a better solution. The problem
(i.e. the reason for the position-check) manifests itself mostly in three-way
diff, as I found out.
Original comment by pfeif...@kde.org
on 4 Jul 2011 at 12:52
Don,
after commenting out the "parent index" thing in TagNode, the testcases
"broke", as expected. When I had a look at fixing them, it became obvious that
it is currently too time consuming to validate them.
So here's my proposal to fix that:
- rename the file expected.xml to expected.html
- make them html markup instead of xml
- add an html header including a reference to the diff.css
That way, one can easily see the expected output (and also validate if the
expected output make sense at all).
I will then regenerate all expected.html files with the current version. If
you're fine with that, I will commit that tomorrow.
Original comment by pfeif...@kde.org
on 4 Jul 2011 at 4:13
[deleted comment]
Hi Carsten,
switching to html files sounds great to me.
Original comment by donny...@gmail.com
on 5 Jul 2011 at 9:35
Committed in r155. Please re-open if you encounter any issues.
Original comment by kkape...@gmail.com
on 13 Sep 2011 at 9:05
Hi,
I've may have found a fix for Carsten's position checking (which brings too
many drawbacks with it). I simply added more separators by commenting out a
couple of lines of code - a simple patch and I'm attaching it to this issue. I
ran the new code against the file-based unit tests and while the code does
perform worse in a few cases it is a huge improvement overall. I believe it
also solves issue 44.
I went over all unit tests which have 'failed' after adding the new code and my
tally goes like this:
Big improvements in Lists 22, 23, 35; General 5, 6, 35
Small improvements in General 9
Big regression in General 38
Slight regressions in Lists 6, 19 (bad HTML!), 20 (bad HTML!), 31; Paragraphs
43; General 39 (though Lists 19 and 20 contain bad HTML, UL elements cannot be
nested in other UL elements unless they are wrapped in a LI first - if I fix
this the result with new code isn't bad)
Couldn't decide: Lists 4, 5, 17, 18; General 42
I would appreciate if somebody could take the time to review these tests. To do
this, apply the patch, run 'ant test' and go to the 'testdata' directory to
review the test results: failed tests will have the '_actual_result.html' file
present in their directory (not that failure here simply means a different
output, not a failure per se).
Original comment by gre...@gmail.com
on 3 Sep 2012 at 8:33
Attachments:
Hi Gregor,
thanks a bunch for your efforts. The improvements of your patch do indeed look
good to me. The regressions in lists however are a bad. It would be awesome if
we could find a way to get the improvements without the regressions :-)
Can you tell me how you fixed the badly nested ULs so that the results are OK?
Adding a <li> in front of the nested <ul> changes the behavior (you get another
bullet point). Even though it's not correct html to leave out the <li>, we have
to support that, because it is quite common. For example the Mozilla HTML
Commposer (the rich text editor in Thunderbird for example) creates such HTML.
Regarding issue 44 -- I couldn't reproduce that problem at all (i.e. plain
1.2). The result was the same (perfect indentation) with and without your patch.
Thanks,
Carsten
Original comment by pfeif...@kde.org
on 19 Sep 2012 at 8:58
Original issue reported on code.google.com by
kkape...@gmail.com
on 16 Aug 2010 at 1:16Attachments: