redhataccess / pantheon-cmd

Pantheon CMD is an open source and freely distributed program for validating and building local previews of modular documentation.
GNU General Public License v3.0
2 stars 5 forks source link

Find a way to avoid false-positive HTML markup detection while validating files containing XML snippets. #23

Closed inoxx03 closed 3 years ago

inoxx03 commented 3 years ago

The validation script currently checks for HTML markup by matching the syntax of HTML tag pairs using a regular expression. Implemented in pcchecks.py

This rule inadvertently picks up XML snippets and other markup that matches the same structure, causing false-positive validation failures for .adoc files that contain snippets of XML markup (such as examples of pom.xml files configurations)

Also, our Supplementary Style Guide recommends that we surround references to replaceable values in angle brackets (< >). This could also fairly easily be interpreted as HTML, leading to validation failure for some modules.

I understand that form an implementation standpoint, this might be a very tricky problem to solve, because it would require us to essentially implement an HTML validator within the validator script, which would be extremely work- and resource intensive.

As a temporary solution we might

  1. implement an option that ignores the HTML rule in pcchecks.py when you run the validator script (unsafe).
  2. implement an option that lowers the error level for HTML markup to WARN instead of FAIL (still unsafe but less so).

Please let me know your thoughts. I might be missing some key technical knowledge and not seeing a potentially more suitable solution.

Levi-Leah commented 3 years ago

A workaround would be to exclude html markup that’s in italic _<some>html</some>_ or in backticks <some>html</some>. <value name> or <value _name> won’t be picked up by the regex as it doesn’t end with </>