svenkreiss / html5validator

Command line tool to validate HTML5 files. Great for continuous integration.
MIT License
314 stars 34 forks source link

Document language inspection is flaky #32

Closed php-coder closed 6 years ago

php-coder commented 7 years ago

I see that 0.27 version has started to report the following warnings:

WARNING:html5validator.validator:"file:/home/travis/build/php-coder/mystamps/src/main/webapp/WEB-INF/views/error/403.html":1.16-5.73: info warning: This document appears to be written in Danish but the "html" start tag has "lang="en"". Consider using "lang="da"" (or variant) instead. "file:/home/travis/build/php-coder/mystamps/src/main/webapp/WEB-INF/views/error/500.html":1.16-5.73: info warning: This document appears to be written in Danish but the "html" start tag has "lang="en"". Consider using "lang="da"" (or variant) instead.

Here are these files:

php-coder commented 7 years ago

I forgot to mention how I'm executing html5validator:

html5validator \
    --root src/main/webapp/WEB-INF/views \
    --ignore-re 'Attribute “(th|sec|togglz|xmlns):[a-z]+” not allowed' \
        'Attribute “(th|sec|togglz):[a-z]+” is not serializable' \
        'Attribute with the local name “xmlns:[a-z]+” is not serializable' \
        'An "img" element must have an "alt" attribute' \
        'The first child "option" element of a "select" element with a "required" attribute' \
    --show-warnings
php-coder commented 7 years ago

I couldn't reproduce it locally under MacOS. But it 100% reproducible in TravisCi.

php-coder commented 7 years ago

Could be related to https://github.com/validator/validator/issues/493

php-coder commented 7 years ago

It reproduces on Linux.

php-coder commented 7 years ago

@sideshowbarker Could you look at it? I suspect that it's not html5validator problem but rather validator related.

sideshowbarker commented 7 years ago

@php-coder I’m not able to reproduce the problem when I directly check the documents using the W3C HTML Checker:

svenkreiss commented 7 years ago

@php-coder Can you link to a Travis build where this error occurred?

php-coder commented 7 years ago

Yes, here they're:

php-coder commented 6 years ago

@sideshowbarker @svenkreiss Hi, this issue has started to appear again time to time, but now it says that the document in French.

Could you suggest me a way of debugging it? Is enabling verbose mode would help? Where the sources of this check, so I can read it/play with the code? Thanks!

sideshowbarker commented 6 years ago

My only suggestion as far as debugging is to see if you can reproduce it with the vnu.jar directly or with https://checker.html5.org/ or https://validator.w3.org/nu/

svenkreiss commented 6 years ago

I agree, you will have to look into the Java part for debugging this.

Also just had a look at your 403.html file. It actually is a template file and not pure HTML. I wouldn't be surprised if the special characters trip up the language detection.

sideshowbarker commented 6 years ago

Also just had a look at your 403.html file. It actually is a template file and not pure HTML. I wouldn't be surprised if the special characters trip up the language detection.

Yeah, if that’s the case, then all bets are off as far as the HTML checker backend behavior goes — and not just specifically for language detection. The checker isn’t a tool for checking pre-parsed PHP or template content. It’s intended for checking the HTML contents as they would be sent over the wire.

php-coder commented 6 years ago

It actually is a template file and not pure HTML.

It should be valid HTML that have a bunch of non-standard th:* attributes.

@svenkreiss Could you add an option to expose --no-langdetect to html5validator then?

svenkreiss commented 6 years ago

Yes, that sounds like a good idea.

On Thu, Jan 4, 2018 at 6:44 AM, Vyacheslav Semushin < notifications@github.com> wrote:

It actually is a template file and not pure HTML.

It should be valid HTML that have a bunch of non-standard th:* attributes.

@svenkreiss https://github.com/svenkreiss Could you add an option to expose --no-langdetect to html5validator then?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/svenkreiss/html5validator/issues/32#issuecomment-355262481, or mute the thread https://github.com/notifications/unsubscribe-auth/ACpYJpuz_TEyopQwreQcWyUPtNwCLZJjks5tHLmQgaJpZM4M-cg4 .

svenkreiss commented 6 years ago

Version 0.2.10 is now on pypi that has this command line option. Does that help?

php-coder commented 6 years ago

Thanks! Looks like it worked!