Closed GoogleCodeExporter closed 8 years ago
I tried to reproduce this with no success. Can you post the exact strings you
are trying to diff with HtmlTestFixture.java ?
Also you could always do some pre-preprocessing before passing the input
strings to DaisyDiff. I am actually using : input = input.replaceAll(" "," ");
in production code. Maybe this might solve your problem as well.
Original comment by kkape...@gmail.com
on 10 Aug 2010 at 3:20
The code that I am using is:
HtmlTestFixture d = new HtmlTestFixture();
String one = "<p>Style sheets represent a major breakthrough for \n Web page designers,expanding their ability to improve the appearance of their pages. </p>";
String two = "<p>Style sheets represent a major breakthrough for Web page designers,expanding their ability to improve the appedfarance oops i am new of their . </p>";
String result = d.diff(one, two);
System.out.println(result);
And the output I get is:
<?xml version="1.0" encoding="UTF-8"?><p>Style sheets represent a major
breakthrough forááá Web page designers,expanding their ability to improve
the <span class="diff-html-removed" id="removed-diff-0" previous="first-diff"
changeId="removed-diff-0" next="added-diff-0">appearance </span><span
class="diff-html-added" id="added-diff-0" previous="removed-diff-0"
changeId="added-diff-0" next="removed-diff-1">appedfarance oops i am new
</span>of their <span class="diff-html-removed" id="removed-diff-1"
previous="added-diff-0" changeId="removed-diff-1" next="last-diff">pages</span>
. </p>
which is almost perfect except for the á characters instead of
input = input.replaceAll(" "," "); will not solve the problem as you will lose
the data about how much space is present between two words or sections unless
the text is between quotes.
Original comment by dominic....@gmail.com
on 11 Aug 2010 at 5:50
3 points.
1. I tried your example with HtmlTestFixture and got normal spaces (not nsbp
but not strange characters either).
2. The HtmlTestFixture is very simple (just for unit tests). For production
quality code I would advise you to look at the main method that performs
several other cleanups. Normal DaisyDiff does exactly what you want (see
attached screenshot)
3. Can you clarify what data is lost by the "replaceAll" method? In your
example if I run this method then I still have the information that 3 spaces
exist before newline. What data is lost? What is the difference if the text
is in quotes or not?
Original comment by kkape...@gmail.com
on 16 Aug 2010 at 1:07
Attachments:
I really dont understand how this is working at you end..could be a JVM issue?
May be I could try some other code as you suggested..
What I meant by you cant use input.replaceAll(" "," ") can be explained by
viewing the below code in a browser.
<p>hello how are you</p>
<p>hello how are you</p>
The output will be the same.
Original comment by dominic....@gmail.com
on 16 Aug 2010 at 3:56
I had the same issue with the
In my case, htmldiff was replacing the correctly to ' ', in UTF-8. On the
other hand, my browser was configured to char encoding != UTF-8.
Solution: configure your browser char encoding to UTF-8.
Original comment by mcdoct...@gmail.com
on 19 Nov 2010 at 7:52
dominic, can you check your browser settings?
Maybe what mcdoctore is suggesting is a solution?
Original comment by kkape...@gmail.com
on 19 Nov 2010 at 3:19
It is working now..Thanks
Original comment by dominic....@gmail.com
on 19 Nov 2010 at 4:30
Closed since it was apparently a browser issue.
Original comment by kkape...@gmail.com
on 20 Nov 2010 at 10:51
Original issue reported on code.google.com by
dominic....@gmail.com
on 2 Aug 2010 at 4:32