zzyengineer / google-code-prettify

Automatically exported from code.google.com/p/google-code-prettify
Apache License 2.0
0 stars 0 forks source link

Pretty printing contents of a code tag can result in extra line breaks #86

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create HTML content like:
<code class="prettyprint">
function testFunc(i) {<br />
&nbsp;&nbsp;alert(i);<br />
};<br />
</code>

2. run prettyPrint method to format the block 

I'd expect the output to appear the same with regard to new lines but
instead there's an extra line break on each line.  This happens because of
the regex on line 1026 (var newlineRe = /\r\n?|\n/g;) which is used to swap
br tags for new lines.  Earlier (on line 651) all br tags are replaced with
new lines so when the later reverse happens the result is to replace two
new lines with two breaks even though visually we only had one break (since
a code block doesn't use new lines for formatting).

A simple work around would be to change the regex to:

var newlineRe = /\r\n?|\n\n?/g;

But that has a caveat for pre blocks since a user could really want two new
lines.  Looking through the code I think the best bet is to treat code as a
special case and remove duplicate breaks:

--- prettify.js 2009-08-04 12:49:00.637381000 -0700
+++ prettify.js.orig    2009-06-19 10:50:25.000000000 -0700
@@ -1395,17 +1395,6 @@
               document.createTextNode('\r'), lineBreak);
         }
       }
-
-      if (cs.tagName === 'CODE') {
-        var lineBreaks = cs.getElementsByTagName('br');
-        for (var j = lineBreaks.length; --j >= 0;) {
-          var lineBreak = lineBreaks[j];
-          var lineBreakSibling = lineBreaks[j].nextSibling;
-          if (lineBreakSibling && lineBreakSibling.tagName === 'BR') {
-            lineBreak.parentNode.removeChild(lineBreakSibling);
-          }
-        }
-      }
     }

     doWork();

Original issue reported on code.google.com by rvro...@gmail.com on 4 Aug 2009 at 7:50

GoogleCodeExporter commented 9 years ago
Hmm.  I thought code elements didn't include line breaks in their innerHTML.

The duplicate break removal will break things like
<pre>
foo()

bar()
</pre>

I could do a computed style check to see whether style.whitespace === 'pre', 
but that
won't work for elements that are disconnected from the DOM.

Original comment by mikesamuel@gmail.com on 10 Aug 2009 at 11:05

GoogleCodeExporter commented 9 years ago
code elements don't behave like pre in that it doesn't honor new lines. e.g. 
this:
<code>some text\nsome other text</code> is rendered like:
some text some other text 

The reason my patch checked the tagname was because I didn't want to break 
formats in
pre.

Original comment by rvro...@gmail.com on 10 Aug 2009 at 11:35

GoogleCodeExporter commented 9 years ago
Not true.

In the below, the first code block acts like a PRE tag, and in the other they 
don't.
 Both are CODE elements, but the set of styles that apply determines whether the
newlines are significant.
<style>.a { white-space: pre }</style>
<h1>A</h1>
<code class=a>some text
some other text</code>
<h1>B</h1>
<code class=b>some text
some other text</code>

Original comment by mikesamuel@gmail.com on 10 Aug 2009 at 11:52

GoogleCodeExporter commented 9 years ago
Ah, I was referring to an code element that had not been altered via a style.  
Yes,
you're correct that the behavior would change if the style had been changed 
such that
white space was now honored.

What about making it an option (only apply "collapsing" to code tags) and 
default it
to off?  

Original comment by rvro...@gmail.com on 11 Aug 2009 at 12:07

GoogleCodeExporter commented 9 years ago
Who would specify the value of the option, and how would they specify it?

Original comment by mikesamuel@gmail.com on 11 Aug 2009 at 12:19

GoogleCodeExporter commented 9 years ago
Why not simply make it a global option like your others?  e.g. like
window['PR_SHOULD_USE_CONTINUATION'], et al

Original comment by rvro...@gmail.com on 11 Aug 2009 at 1:03

GoogleCodeExporter commented 9 years ago
So the plan is to add an API element to let the user choose which <code> 
elements to
do the wrong thing for?

That seems like needless complication.

Original comment by mikesamuel@gmail.com on 11 Aug 2009 at 2:37

GoogleCodeExporter commented 9 years ago
It's only the "wrong" thing if the code element is styled differently than 
default. 
It's the "right" thing if the code element styling is left to the default 
browser
interpretation which means a break element is needed to create new lines and 
thus the
prettify code, as it stands, will add two breaks vs. one.  I'm certainly open to
other ideas if you have them but at this point the code isn't usable by our 
wysiwyg
(and I'd imagine most) editors since it adds break elements for each new line as
enter is hit by the user.

Original comment by rvro...@gmail.com on 11 Aug 2009 at 3:34

GoogleCodeExporter commented 9 years ago
It's wrong if it mangles well formatted code, and the semantic HTML guys
(http://semantichtml.org/ ) often use <code> elements with preformatting.

There are a few easy cases:
  (1) If it's PRE or XMP, then it's preformatted code.  That's what those elements mean.
  (2) If it doesn't contain a newline character, it doesn't matter.
That should handle a lot of the simple cases efficiently.
  (3) If it has a computed style of white-space:pre then it is not preformattted.

That leaves unresolved the case of whether or not something that is not 
attached to
the DOM (and so has no computed style) is preformatted or not.  I think the 
answer is
that it is indeterminate, so we can choose whatever behavior seems easiest.

Does that sound reasonable?

Original comment by mikesamuel@gmail.com on 11 Aug 2009 at 4:49

GoogleCodeExporter commented 9 years ago
Hi Mike,

I won't argue it's sort of a funny formatting case (and wrapping in pre/code 
would be
ideal if it were plausible at this time) but it's not quite as easy (or clean) 
to get
semantic html out of wysiwyg I'm afraid.

I agree that your easy cases are spot on and should not be modified from the 
current
behavior.  Am I correct in interpreting that your final case (4) would only be
applied to text inside a code element and your ok with removing the extra breaks
then?  If so, then that sounds good to me and should work for us just fine.

thanks  

Original comment by rvro...@gmail.com on 11 Aug 2009 at 4:32

GoogleCodeExporter commented 9 years ago
Ok.  It looks like the computed style check handles some weird cases well:
  <pre><code class=prettify>...</code></pre>

I'll go ahead and implement that, and send you a patch so you can check that it 
fixes
your immediate problem.

Original comment by mikesamuel@gmail.com on 12 Aug 2009 at 4:43

GoogleCodeExporter commented 9 years ago
Fixed at revision 80

Original comment by mikesamuel@gmail.com on 14 Aug 2009 at 5:37