htacg / tidy-html5

The granddaddy of HTML tools, with support for modern standards
http://www.html-tidy.org
2.71k stars 418 forks source link

tidy 5.6.0 warning `inserting missing 'title' element` appears in php-only files #728

Closed justinhartman closed 6 years ago

justinhartman commented 6 years ago

Hi guys,

I'm having a problem with tidy in that it keeps showing me the warning inserting missing 'title' element on php-only files. There is no way for me to ignore this warning and it's bugging the living hell out of me. I'm using tidy along with two plugins for Sublime Text 3 and getting this warning both using tidy directly from the command line as well as in the console output of Sublime.

This is my environment.

Hardware

MacBook Pro (13-inch mid-2012)
macOS High Sierra - 10.13.3
2.5 GHz Intel Core i5
10 GB 1600 MHz DDR3
Intel HD Graphics 4000 1536 MB

Software

Package Manager     Homebrew 1.6.2
Package Source      Homebrew/homebrew-core (git revision c010d8; last commit 2018-04-28)
tidy Version        HTML Tidy for Apple macOS version 5.6.0
Editor              Sublime Version 3.0, Build 3143
Editor Plugin       SublimeLinter 4.4.0
Editor Plugin       SublimeLinter-html-tidy 2.0.0

I installed tidy directly via homebrew:

$ brew install tidy-html5
$ which tidy
/usr/local/bin/tidy
$ ls -lh /usr/local/bin/tidy
lrwxr-xr-x  1 macbookpro  admin    35B 23 Apr 16:05 /usr/local/bin/tidy -> ../Cellar/tidy-html5/5.6.0/bin/tidy
$ ls -lh /usr/local/Cellar/tidy-html5/5.6.0/bin/tidy
-r-xr-xr-x  1 macbookpro  admin   731K 25 Nov 15:54 /usr/local/Cellar/tidy-html5/5.6.0/bin/tidy

Settings

This is my configuration file for SublimeLinter in Sublime text editor:

// SublimeLinter Settings - User
{
    "lint_mode": "save",
    "no_column_highlights_line": true,
    "paths": {
        "osx": 
        [
            "/usr/local/bin/"
        ]
    },
    "show_panel_on_save": "view",
    "linters":
    {
        "htmltidy":
        {
            "@disable": false,
            "args": 
            [
                "--show-warnings",
                "true"
            ],
            "excludes": [],
            "ignore_match": [
                "missing <!DOCTYPE> declaration",
                "inserting missing 'title' element"
            ],
        },
        "php":
        {
            "@disable": false,
            "ignore_match": [
                "missing <!DOCTYPE> declaration",
                "inserting missing 'title' element"
            ],
        },
        "phpcs":
        {
            "@disable": false,
            "excludes": [],
            "ignore_match": [
                "missing <!DOCTYPE> declaration",
                "inserting missing 'title' element"
            ],
        },
        "json":
        {
            "@disable": false,
        },
    },
}

I am using SublimeLinter-html-tidy as the plugin that ties tidy in to the Sublime user space so that I can lint html documents. In the above SublimeLinter settings I am using ignore_match to exclude the inserting missing 'title' element warning from appearing in the htmltidy, php and phpcs linter plugins.

SublimeLinter-html-tidy or rather tidy itself, is not respecting this setting and continues to output the warning. I don't need tidy to do this for me, I have phpcs installed to deal with linting of php files.

But, as this is a bug report about tidy and not Sublime or its plugins, the following is my $ tidy -show-config Configuration File Settings output from the terminal:

Configuration File Settings:
Name                        Type       Current Value
=========================== =========  ========================================
accessibility-check         Enum       0 (Tidy Classic)
add-meta-charset            Boolean    no
add-xml-decl                Boolean    no
add-xml-space               Boolean    no
alt-text                    String
anchor-as-name              Boolean    yes
ascii-chars                 Boolean    no
assume-xml-procins          Boolean    no
bare                        Boolean    no
break-before-br             Boolean    no
char-encoding               Encoding   utf8
clean                       Boolean    no
coerce-endtags              Boolean    yes
css-prefix                  String     c
custom-tags                 Enum       no
decorate-inferred-ul        Boolean    no
doctype                     String     auto
drop-empty-elements         Boolean    yes
drop-empty-paras            Boolean    yes
drop-proprietary-attributes Boolean    no
enclose-block-text          Boolean    no
enclose-text                Boolean    no
error-file                  String
escape-cdata                Boolean    no
escape-scripts              Boolean    yes
fix-backslash               Boolean    yes
fix-bad-comments            Enum       auto
fix-style-tags              Boolean    yes
fix-uri                     Boolean    yes
force-output                Boolean    no
gdoc                        Boolean    no
gnu-emacs                   Boolean    no
hide-comments               Boolean    no
indent                      Enum       no
indent-attributes           Boolean    no
indent-cdata                Boolean    no
indent-spaces               Integer    2
indent-with-tabs            Boolean    no
input-encoding              Encoding   utf8
input-xml                   Boolean    no
join-classes                Boolean    no
join-styles                 Boolean    yes
keep-tabs                   Boolean    no
keep-time                   Boolean    no
literal-attributes          Boolean    no
logical-emphasis            Boolean    no
lower-literals              Boolean    yes
markup                      Boolean    yes
merge-divs                  Enum       auto
merge-emphasis              Boolean    yes
merge-spans                 Enum       auto
mute                        String
mute-id                     Boolean    no
ncr                         Boolean    yes
new-blocklevel-tags         Tag Names
new-empty-tags              Tag Names
new-inline-tags             Tag Names
new-pre-tags                Tag Names
newline                     Enum       LF
numeric-entities            Boolean    no
omit-optional-tags          Boolean    no
output-bom                  Enum       auto
output-encoding             Encoding   utf8
output-file                 String
output-html                 Boolean    no
output-xhtml                Boolean    no
output-xml                  Boolean    no
preserve-entities           Boolean    no
priority-attributes         Attribute
punctuation-wrap            Boolean    no
quiet                       Boolean    no
quote-ampersand             Boolean    yes
quote-marks                 Boolean    no
quote-nbsp                  Boolean    yes
repeated-attributes         Enum       keep-last
replace-color               Boolean    no
show-body-only              Enum       no
show-errors                 Integer    6
show-info                   Boolean    yes
show-meta-change            Boolean    no
show-warnings               Boolean    yes
skip-nested                 Boolean    yes
sort-attributes             Enum       none
strict-tags-attributes      Boolean    no
tab-size                    Integer    8
tidy-mark                   Boolean    yes
uppercase-attributes        Enum       no
uppercase-tags              Boolean    no
vertical-space              Enum       no
warn-proprietary-attributes Boolean    yes
word-2000                   Boolean    no
wrap                        Integer    68
wrap-asp                    Boolean    yes
wrap-attributes             Boolean    no
wrap-jste                   Boolean    yes
wrap-php                    Boolean    yes
wrap-script-literals        Boolean    no
wrap-sections               Boolean    yes
write-back                  Boolean    no

PHP Sample

For your reference, here is my full php file so that you can see there is no hint of html in it. I am including the full file (instead of a sample) because when you see the outputs later on in this ticket you will notice the references to html5 and the missing title element but I'm very confused how the file below can create this perception that the content is html5 content.

<?php
/**
 * Adminer Custom (https://github.com/pematon/adminer-custom)
 * Copyright (c) 2014-2018 Pematon, s.r.o. (http://www.pematon.com/)
 *
 * Licensed under The MIT License For full copyright and license information,
 * please see the LICENSE. Redistributions of files must retain the above
 * copyright notice.
 *
 * @category  Index
 * @package   AdminerCustom
 * @author    Peter Knut <peter@pematon.com>
 * @author    Justin Hartman <justin@hartman.me>
 * @copyright 2014-2018 Pematon, s.r.o. (http://www.pematon.com/)
 * @license   https://opensource.org/licenses/MIT MIT License
 * @version   1.3.0
 * @link      https://github.com/pematon/adminer-custom
 */

/**
 * Adminer Custom (https://github.com/pematon/adminer-custom)
 *
 * @category  Index
 * @package   AdminerCustom
 * @author    Peter Knut <peter@pematon.com>
 * @author    Justin Hartman <justin@hartman.me>
 * @copyright 2014-2018 Pematon, s.r.o. (http://www.pematon.com/)
 * @license   https://opensource.org/licenses/MIT MIT License
 * @version   1.3.0
 * @link      https://github.com/pematon/adminer-custom
 */

/**
 * Adminer Object Method
 *
 * This function configures the Adminer app by loading and configuring
 * the site's plugins.
 *
 * @return array AdminerPlugin() returns an array of objects which
 *               contains the plugins and settings from the loaded
 *               plugin configuration files.
 */
function adminObject()
{
    // Required to run any plugin.
    include_once __DIR__ . '/plugins/plugin.php';

    // Plugins auto-loader.
    foreach (glob("plugins/*.php") as $filename) {
        include_once __DIR__ . '/'.$filename;
    }

    // Specify enabled plugins here.
    $plugins = [
    new AdminerDatabaseHide(["mysql", "information_schema", "performance_schema"]),
    new AdminerLoginServers(
        [
            filter_input(INPUT_SERVER, 'HTTP_HOST') => filter_input(INPUT_SERVER, 'SERVER_NAME')
        ]
    ),
    new AdminerSimpleMenu(),
    new AdminerCollations(),
    new AdminerJsonPreview(),

    // AdminerTheme has to be the last one.
    new AdminerTheme(),
    ];

    return new AdminerPlugin($plugins);
    /**
     * Whether to use Adminer or the Editor file.
     *
     * @var boolean editorSwitch() true or false.
     */
    $editor = false;
    return editorSwitch($editor);
}//end admin_object()

/**
 * Include original Adminer or Adminer Editor.
 *
 * You can use one of the two files below depending on your use case. Simply set
 * editor variable to true to use Adminer Editor instead of the original Adminer.
 *
 * NB: It is important to note the Adminer Editor does not inherit the styles
 * that Adminer does. This is because plugins do not work for Adminer Editor so
 * you are using the default Adminer Editor with no additional styles or
 * functionality other than what is provided by the original Editor file.
 *
 * @author Justin Hartman <justin@hartman.me>
 * @since  1.3.0
 * @param  boolean $editor If defined as `true` the function will use the Adminer
 *                         Editor script instead of the Adminer script (default).
 *
 * @return include Includes one of two files in `adminer.php` or `editor.php`.
 */
function editorSwitch($editor)
{
    if ($editor === false) {
        $editorFile = '/adminer.php';//Adminer
    } elseif ($editor === true) {
        $editorFile = '/editor.php';//Adminer Editor
    }
    include __DIR__ . $editorFile;
}//end editorSwitch()

Errors when running tidy

I created the abovephp only file with zero html in it so there is no possible way I can add a title element. There is clearly a bug here in that it cannot determine that nor can it ignore the overide setting.

SublimeLinter output

Here is the warning I am receiving when linting my php file:

SublimeLinter Output

Terminal output running tidy directly

This is what happens when I run $ tidy original-index.php -o index.md against my original-index.php file:

line 106 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!

<!DOCTYPE html>
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Apple macOS version 5.6.0">
<title></title>
</head>
<body>
</body>
</html>

What confuses me a lot is that tidy is giving me an information notice that Document content looks like HTML5 and this makes me wonder how this is related to the inserting missing 'title' element that tidy keeps warning me about. There should be no reason for it to pick up the doc content as html5. Nothing, and I mean nothing, even remotely looks like html5 in the PHP file.

Setting --show-warnings flag to false

If I try and change the --show-warnings flag to false in both tidy command line and in the SublimeLinter settings file I still get the warning output. It doesn't respect the --show-warnings flag either.

$ tidy original-index.php -o index.md --show-warnings false
line 106 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!

<!DOCTYPE html>
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Apple macOS version 5.6.0">
<title></title>
</head>
<body>
</body>
</html>
"html-tidy":
{
    "@disable": false,
    "args": 
    [
        "--show-warnings",
        "false"
    ],
    "excludes": [],
    "ignore_match": [
        "missing <!DOCTYPE> declaration",
        "inserting missing 'title' element"
    ],
},

So, is this a bug that needs fixing or is this me missing something completely?

geoffmcl commented 6 years ago

@justinhartman thank you for the issue...

I'm very confused how the file below can create this perception that the content is html5 content.

Well do not be confused! You created this by passing the file to tidy. Tidy treats every and any file passed to it as html, or xhtml, and looks at the content as that markup type code... will try to fix it if it can, and report warnings and errors...

And if tidy does not find a DOCTYPE it will assume html5...

Now it seems you passed an unclosed processing instruction block, <?php ..., so tidy just threw that away. If you had added the close it was looking for ... ?>, it would have kept it as part of the html document... Try the following php and you should see what I mean -

<?php
function foo()
{
   return 1;
} // end foo()
?>

But as I am sure you know, in a xhtml document, such processing instruction are usually within the document body, like -

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>#782-3</title>
</head>
<body>
<?php
function foo()
{
   return 1;
} // end foo()
?>
</body>
</html>

That file will pass tidy with no errors or warnings, as with the W3C validator...

So far I can not see any tidy bug here, except maybe, but only maybe, it could report something like an line 1 column 1 - Warning: unclosed processing block discarded on the above php sample, but that should be a separate issue. If someone was interested enough to look into this and raise an issue, that would be appreciated, and dealt with separately...

Now I know nothing about SublimeLinter, or Sublime Text editor, but it seems you should explore the multiple options for tidy - see QuickRef.html

But this is only if you are passing a [x]html document to tidy. Certainly not a pure php file, that is with no html markup...

For example, if you have no interest in the head part of the html, then --show-body-only yes might be what you need... That does avoid some head warnings...

If you do not want to see specific warnings, then discover their ID with --mute-id yes, and then supply a --mute <ID> option, one for each message you want to suppress...

Or even --show-warnings no to even suppress all warnings, show only errors, if any. And/or --show-info no... And/or --quiet yes...

Have not explored all the options available, but it seems there should be one or more to get the tidy ouput you want...

Does this answer your question, or have I missed something? Thanks...

justinhartman commented 6 years ago

@geoffmcl thank you so much for (a) your detailed response and (b) being so diligent in helping me understand the underlying issue with this issue report. I completely get what you are saying here and can now see the error of my ways 😄

The main issue is that SublimeLinter executes all installed linters on a file, irrespective of file-type, file structure, etc. This effectively means that no matter what, when applying the lint function in Sublime using SublimeLinter it will process all installed linters available and in turn, each linter will return a set of results, and in this case errors.

Now it seems you passed an unclosed processing instruction block, <?php ..., so tidy just threw that away. If you had added the close it was looking for ... ?>, it would have kept it as part of the html document... Try the following php and you should see what I mean.

Yes, running your example gave me the expected results and once my PHP codeblock was closed off the errors were omitted from the block itself and phpcs took over.

I don't know if SublimeLinter has the ability to turn off certain linters based on the file-type or code-block being checked but what is clear is that this is not an issue with tidy-html5 but rather an issue with either Sublime or SublimeLinter.

Thank you for your patience and detailed response to an issue that clearly wasn't an issue with tidy-html5. I appreciate it immensely and thank you for your time.

I am satisfied that this isn't an issue that should remain open so please feel free to close this without response.

Regards, Justin

justinhartman commented 6 years ago

@geoffmcl I do want to ask one question though. When I run $ tidy original-index.php -o index.md --show-warnings false why is it that tidy still shows me warning messages (irrespective that the file is pure PHP) when this flag is set? Surely the expected result is that tidy will only output errors and not warnings?

geoffmcl commented 6 years ago

@justinhartman glad to hear you seem to have found your problem... do not pass pure php only files to tidy... it will treat them as html markup code...

The command $ tidy original-index.php -o index.md --show-warnings false may be a problem. If the intended input file is original-index.php, then it seems to be in the WRONG order...

From the API docs you can see the command is tidy [[options] filename]

Note the input html filename comes last, and if none given tidy will read stdin...

The reading of stdin allows for $ echo "Hello World" | tidy -q --show-warnings no usage... try it...

Change that to $ tidy -o index.md --show-warnings false original-index.php, and you should get what you want...

And note that in fact you can have multiple input files, and multiple sets of options, like -

$ tidy [options1] [file1 [file2 ...] [options2] [file3...]]

This will process file1 ... with the first set of options1, then file3 with added options2, and so on...

Now in your case you have given $ tidy file1 options, which means tidy will process file1, then read the options, then use stdin for the final input file...

HTH...

Anyway, glad you closed this... thanks...

Sachelis commented 5 years ago

I am experiencing a similar problem with the "Inserting missing 'title' element" feature. I have files that are included in other files. Obviously those include files should not be passed to tidy, but I'm passing hundreds of files to tidy via a batch file and it's challenging to exclude a list of specific (and changing) files.

It seems like there should be a way to say, "Don't insert a missing doctype or title. If they're missing, just treat it as an error, and don't modify the file."

geoffmcl commented 5 years ago

@Sachelis this issue was closed, over a year ago, and your new comment here seems to be different, an entirely different request...

It seems you want an option, to sort of say, don't be tidy, or something... ;=))

i.e. Don't insert a missing doctype or title... don't modify the file.! WOW really...

Why would you want that? What do you want tidy to do? Why are you passing this file to tidy in the first place... what gain are you looking for?

If you just want the body html, have you tried --show-body-only yes - API...

But other than that, to ignore missing doctype or title seems nearly impossible...

I just do not see a use case for that...but if you think you can build one, please open a new issue, with full explanation and reasoning... thanks

Meantime, this issue remains CLOSED... thanks...