htacg / tidy-html5

The granddaddy of HTML tools, with support for modern standards
http://www.html-tidy.org
2.71k stars 418 forks source link

wrap-php multiple lines #437

Closed bfrmtx closed 3 years ago

bfrmtx commented 8 years ago

For somewhat reason the (by default) enabled option wrap-php does not work when multiple statements are following line by line. I used tidy --wrap-php yes --char-encoding utf8 --clean yes -indent --indent-spaces 2 --wrap 90 --input-encoding utf8 --tidy-mark no mfs-06e_coil.php > out.php

the file

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <meta name="description" content="manual">
    <title>
     MFS-06e
    </title>
  </head>
  <body>
<h1>MFS-06e</h1>

<?php include("$path" . "/metronix_man/en/manuals/mfs-06e/1_product_description.inc"); ?>
<?php include("$path" . "/metronix_man/en/manuals/mfs-06e/2_sensor.inc"); ?>
<?php include("$path" . "/metronix_man/en/manuals/mfs-06e/3_preamplifier.inc"); ?>
<?php include("$path" . "/metronix_man/en/manuals/mfs-06e/4_transfer_functions.inc"); ?>
<?php include("$path" . "/metronix_man/en/manuals/mfs-06e/5_integrated_calibration_facility.inc"); ?>
<?php include("$path" . "/metronix_man/en/manuals/mfs-06e/6_calibration_by_manufacturer.inc"); ?>
<?php include("$path" . "/metronix_man/en/manuals/mfs-06e/7_sensor_noise.inc"); ?>
<?php include("$path" . "/metronix_man/en/manuals/mfs-06e/8_installation_of_magnetometers.inc"); ?>
<?php include("$path" . "/metronix_man/en/manuals/mfs-06e/9_pinout_of_connector_socket.inc"); ?>
  </body>
</html>
geoffmcl commented 8 years ago

@bfrmtx thanks for the report...

I assume the problem is that the above php will be output as one line...

In checking the code, it seems some time back we removed a conditional line flush, after each php block... I have not had a chance to look deeper into why? When?

But as a test, I applied the following small patch -

diff --git a/src/pprint.c b/src/pprint.c
index a4f2aef..9d443ea 100644
--- a/src/pprint.c
+++ b/src/pprint.c
@@ -1819,7 +1819,9 @@ static void PPrintPhp( TidyDocImpl* doc, uint indent, Node *node )
                 indent, node );
     AddString( pprint, "?>" );

-    /* PCondFlushLine( doc, indent ); */
+    if (wrapPhp)
+        PCondFlushLine( doc, indent );
+
     WrapOn( doc, saveWrap );
 }

While that restores each php statement to a new line, whether you add one or not, I am not sure it is the correct use of wrapPhp... Maybe we should be using another config parameter???

And what about if you write -

<?php include("/a.inc"); ?><?php include("/b.inc"); ?><?php include("c.inc"); ?>

Should they be kept as one line, or a newline added? At the time these blocks are output tidy has no memory of whether the user put them on one line, or over many lines... that information is lost...

But have I got the right idea here? Look forward to your further comments...

bfrmtx commented 8 years ago

Sure, wrap php shall force a new line - that is the exactly the same behavior as we expect with HTML code <?php include("/a.inc"); ?> <?php include("/b.inc"); ?> <?php include("c.inc"); ?>

Thank you for your fast and kind response

geoffmcl commented 8 years ago

@bfrmtx, well I explored back into the source to find out exactly when that PCondFlushLine() was commented out, to try to understand why... and came up with a surprise...

It was imported like that, when this github repo was initialized, by @sideshowbarker, on Nov 17, 2011, so went back into the old, old cvs source, and again, is was imported like that, in Revision 1.1, Sun May 27 02:27:46 2001 UTC (15 years, 1 month ago) by terry_teague, when the SF cvs repo was created...

Tidy has always been like this... that is never placing successive php blocks on a new line... so this is certainly a Feature Request... a change in long established behavior...

I have not yet done a search in all the issues... ie has this come up before... strange if it has not... but, sure, it seems worthy of consideration...

Now I do not write much php in html, but the following seems another case, where keeping blocks inline would be important -

<p>Issue <?php echo $version ?><?php echo $date ?></p>

With this patch this becomes -

<p>Issue <?php echo $version ?>
<?php echo $date ?>
</p>

Now while browsers, like tidy, tend to ignore newlines, they will usually add at least a space, which may not be desired here... you would get $version $date versus $version$date...

So while this simple patch, or a similar patch, is going to potentially change things for those that do use php in html, and then use tidy...

And like I advised, at the time of this pretty printing, tidy has no memory of whether the user added, or did not add, a newline between consecutive php blocks...

Maybe we do need a way to retain that info, and use it to help decide, and this may assist in other issues, like say #329 - still open...

We do still have one open php issue, #392 - creeping indent on php blocks - but maybe this does not seem to effect that... that too still needs a fix...

And we do have one php block in the testbase, namely 433656, but this patch does not effect that output... so would not signal a regression flag...

So, like I say, seems like a sound new feature, but since tidy has been like this for 15 plus years, will leave it a little time longer for more consideration and comments...

What do other think? Thanks...

bfrmtx commented 8 years ago

Ok, that's fine for me. I checked out your code, did the mentioned modification - works fine! I would also agree to make a new feature out of it in order not to disturb a possible "diff check"

I have added a test case (with fake number) for you - in case it becomes a new feature.

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>[ #123456789 ] Improve support for wrap PHP </title>
<!-- wrap php single to multiple lines properly -->
</head>
<body>
<p>
<strong>wrap-php-extended</strong><br>
Type: <strong>Boolean</strong><br>
Default: <strong>no</strong><br>
Example: <strong>y/n, yes/no, t/f, true/false, 1/0 </strong><br>    
This option specifies if Tidy should line wrap text contained within PHP pseudo elements, which look like: <?php ... ?>. <br>
The extended option will additionally force newlines for nested php statements.<br>
 </p>
 <p><br><br></p>
 <p>test with: <br>
 tidy --wrap-php yes --char-encoding utf8 --clean yes -indent --indent-spaces 2 --wrap 90 --input-encoding utf8 --tidy-mark no in.html > out.html</p>
 <p>issue: <a href="https://github.com/htacg/tidy-html5/issues/329">#329</a></p>
 <?php mkheader(); ?><?php mkbody(); ?><?php mkfooter(); ?>
</body>
</html>
geoffmcl commented 8 years ago

@bfrmtx thanks for taking the time and considering this more...

Ok, if we go for a new feature what would be its name?

You suggest wrap-php-extended, which is maybe ok, but what about wrap-php-tags, or something...

And there is a README OPTIONS.md, which enumerates the basic four steps of a new option, the last just being its use in the code...

And adding a test regression case is optional, and anyway it is in another repo... note the refactor branch... so would be a separate issue anyway...

But maybe we are getting ahead of ourselves... maybe a new option is not required!

First we need to understand what does the current wrap-php, namely TidyWrapPhp do, if anything? I am struggling to understand exactly what it does!

The description is simple "... should line wrap text contained within PHP pseudo elements, which look like: <?php ... ?>."... but in all tests so far, I can not make that happen...

After the <? is added to the output, the actual php text content is done by PPrintText, and the only difference is the mode, from (wrapPhp ? CDATA : COMMENT)... and --wrap-php is on by default...

Now it seems in PPrintText the text data will be added character by character to the print buffer, and it seems these two modes, CDATA or COMMENT are treated the same, with the code -

    /* comment characters are passed raw */
    if ( mode & (COMMENT | CDATA) )
    {
        AddChar( pprint, c );
        return;
    }

And when you think about it more, how could tidy wrap php script!

It does not know php syntax, and thus has no idea where line breaks would be allowed in such a script... ergo, it can not wrap any of this inner php text...

While maybe indenting is ok - just adding space to the beginning of each line - although as mentioned there is a problem with that in #392... it seems wrapping is impossible...

So maybe we have this useless wrap-php option, that becomes available for another use...

Maybe it should default to no, the description amended to say like... add a newline after each php pseudo element <? ... ?>, and then the patch does what you want...

Always interesting how quickly a simple idea can get a lot more complicated ;=))

Look forward to further comments...

geoffmcl commented 7 years ago

While this still seems a good Feature Request, and potentially a new option, nothing has been done about this to date...

Accordingly, since we are about to release 5.4, moving this out to next 5.5.

As always, comments, patches, PR very welcome... thanks...

geoffmcl commented 6 years ago

Review and more RESEARCH:

The wrap-php option

Seems wrap-php is only used to add CDATA or COMMENT to the mode in a call to PPrintText, from PPrintPhp, as follows...

    PPrintText( doc, (wrapPhp ? CDATA : COMMENT),
                indent, node );

But how is this mode later used?

Well in PPrintText it is passed to TextEndsWithNewline, but there the test of mode is -

if ( (mode & (CDATA|COMMENT)) && ...

so it make no difference there, one or the other...

It is also passed to TextStartsWithWhitespace, but again the test of mode is -

if ( (mode & (CDATA|COMMENT)) && ...

So again it makes no difference...

Finally, IFF the character being printed is a &, the mode is passed to PPrintChar, sometimes with CDATA also added, or not, so maybe different, but again the code says -

    /* comment characters are passed raw */
    if ( mode & (COMMENT | CDATA) )
    {
        AddChar( pprint, c );
        return;
    }

And yet again, it seems to make no difference when printing PHP text... It does make a difference in printing other text nodes...

So as suggested above this option wrap-php makes no significant difference, yes or no. It is presently a completley useless option, not really changing anything... could even be removed, but that should be a separate issue...

And may likewise apply to some other script type wrap options...

Re-use of wrap-php option

So, also as stated, Tidy could not really wrap PHP text. It does not understand PHP syntax, so could never wrap such text, so could this option be changed, and used to wrap PHP blocks? That is to add a newline between successive <?php ... text ... ?> blocks?

BUT as stated, that could add a newline, which would probably be treated as a space character in a browser, adding perhaps an unwanted space... as shown above... so now, on further reflection, do not think I agree with this...

So, in total, while it seems nice to apply line wrapping to the original sample PHP blocks, as requested, rather than outputting a single line as current tidy does, that single line is also not wrong...

And could not those multiple php blocks be written as one block, like -

<p>
<?php 
include("a.inc");
include("b.inc");
include("c.inc"); 
?>
</p>

Conclusion

At this point I am thinking of adding Won't Fix, and closing this, especially since we are headed for a new release this month, and no one has stepped up with a solution here, or even provided further feedback in more than a year... comments welcome... thanks...

geoffmcl commented 6 years ago

Have 2nd, 3rd, 4th,... thoughts on this, and after much more testing ;=))

First my reading of the code logic was wrong in parts... But still in all tests conducted with really long php lines, already wrapped, or not, short, etc, etc, I could not make this option do anything... the default yes or no always ouput the same thing... maybe I need to test more, got it wrong again! But...

But since the wrap-php option presently seems to do nothing, why not use it as suggested above?

The patch is very simple, only using --wrap-php yes to wrap consecutive <?php ... ?> blocks, if desired... default no, and adjusted the description accordingly... which it seems more or less what @bfrmtx wanted back in 2016... Well not exactly wrap, but adds a new line at the end of the blcok, which is nore or less the same thing...

At least this causes a difference between yes and no in some of my 8 test cases, see in_437*.html, while there is none with current 5.5.76 tidy...

This is pushed to an issue-437 branch for testing, and comments...

As stated above, this probably applies to all wrap-xxx options, but will leave that to separate issues, when/if brought up...

As also implied, it seems these were an option to wrap these various pseudo tags, in that if off, would temporarily set TidyWrapLen to 0xFFFFFFFF, with the comment /* very large number */, but perhaps subsequently always disabled this wrap... at least I could not get it to work as advertised, but maybe I got the tests wrong...

What do others think? Maybe I should just give up, and close, but this branch does do what @bfrmtx asked, I think... Thanks...

geoffmcl commented 6 years ago

Created PR #645 to address this... review, testing and feedback welcome... thanks...

balthisar commented 6 years ago

@geoffmcl, this works for me as described, but...

I poked through the source code and ran many tests prior to testing this PR, and for the life of me, there's nothing that indicated what wrap-php was supposed to have done! I scoured through the old mailing list, but did not look at archived source code for changes, and this is completely strange.

I’ve also done some testing with wrap-jste, wrap-asp, and wrap-sections, and I can’t seem to get them to affect any of my tests at all!! A brief look at their code indicates that they always PPrintText() in mode CDATA or COMMENT, which the pretty printer is treating identically anyway!

So now I wonder this:

FWIW, the new behavior is desired and I support the PR, possibly as-is if we answer the questions above, possibly with a different option name.

geoffmcl commented 6 years ago

@balthisar well I think all those questions have been raised, and discussed, and felt answered, before I presented the simple re-use PR #645, so am still for merging it as is... but...

But if you want to open another issue like What about the other **wrap-xxx** options that do **nothing**?, then that is ok, including using some depreciation mechanism you mention, but do not describe... How can we know what you have in mind here?

Or if you want to suggest that the new behaviour be implemented in a new option, like wrap-php-blocks or wrap-php-elements then again go for it... Maybe close my PR and present a new one... not too hard... Not much code to change... reinstate the do nothing wrap-php, a few re-names, new enum, etc...

But me, feel I have discussed, researched, this to death, and not prepared to do anything different, or any more at this time... maybe being a bit under-the-weather atm features in this... do not know...

Or just move it out to a new milestone, beyond this release, close the PR, and we do dicuss it more... it has already been open since Jul 11, 2016, and could probably wait some more...

Look forward to your choice... thanks...

balthisar commented 6 years ago

I support the change, and being under the weather is making you a bit grumpy, it seems! The reason I wanted discussion was to, you know, discuss. This now puts us in a place where Tidy does something with wrap-php, but does nothing with the other options, meaning that from a user's perspective, there are inconsistencies.

"What to do" hinges on what we might want to do in the future. Do we want to deprecate the other options? That means wrap-php will stand on its own, thus avoiding any inconsistencies. Do we want the other wrap- options to do the same thing as wrap-php does, someday? Then we leave it alone. Should they be called wrap-xxx-blocks if we ever implement them? Then why not make this wrap-php-blocks?, etc.

If the vision is limited to one change, then that's not a vision and everything becomes ad hoc.

tl;dr: I'm just trying ask, where do you see the other options going?

geoffmcl commented 6 years ago

As this is still open, and under discussion moving the milestone out...

geoffmcl commented 6 years ago

@balthisar one last quiet, calm try before this Jul 11, 2016 issue gets moved out to the next milestone... yet again...

tl;dr: I'm just trying ask, where do you see the other options going?

And I am trying to answer, that is not the issue being addressed here. If you want to address other than wrap-php, i.e. this issue, then please open a new issue for that, with broader scope...

Then preferable add some code changes, patch or PR to support that issue... and that might include further changes in wrap-php as part of it... let your coding fingers do the talking ;=))

"What to do" hinges on what we might want to do in the future...

Can't see what I might want to do in the future! And that means open, discuss, code, and present other patches, PR's on other than wrap-php, although that may also include wrap-php...

If the vision is limited to one change, then that's not a vision and everything becomes ad hoc.

Yes, at this time it is limited to one change. Whether you call that a vision, or ad hoc does not seem to matter, although prefer the former...

But I do not think because of this one change, then everything becomes ad hoc. That seems a gross exaggeration...

And am not too concerned that some very astute user might discover there is a set of wrap-xxx options, 3-4 I think, and here only one, at this time, is being changed, making it inconsistent with the others at this time...

I have spent time, discussed, and presented PR #645 just for wrap-php... just a few lines of code...

That does not preclude me having a vision, open another issue, discuss, and code and present something more inclusive in the future, but just not now...

And that might include some sort of renaming of it, or all of them, to wrap-php-blocks, or something... which might then include depreciating the current names... but just not now...

We have this code for now, just for this specific issue... merge it or not is the only reply I want...

As stated earlier, this issue has already been around for some considerable time, so skipping it for now is also no problem... thanks...

geoffmcl commented 3 years ago

@bfrmtx after quite a long delay, have merged PR #645, thus closing this... thanks...