Pandoc error when producing example report

mikehoyio commented 3 years ago

Cloned from master today and running the example reports with: ./tm.py --report docs/template.md | pandoc -f markdown -t html > report.html

Produces the following error: pandoc: (TagClose "script") is not a TagOpen CallStack (from HasCallStack): error, called at src/Text/HTML/TagSoup/Type.hs:128:19 in tgsp-0.14.8-2271f385:Text.HTML.TagSoup.Type

I think the error is with pandocs formatting of the following threat: { "SID": "SC04", "target": ["Server"], "description": "XSS Using Alternate Syntax", "details": "An adversary uses alternate forms of keywords or commands that result in the same action as the primary form but which may not be caught by filters. For example, many keywords are processed in a case insensitive manner. If the site's web filtering algorithm does not convert all tags into a consistent case before the comparison with forbidden keywords it is possible to bypass filters (e.g., incomplete black lists) by using an alternate case structure. For example, the script tag using the alternate forms of Script or ScRiPt may bypass filters where script is the only form tested. Other variants using different syntax representations are also possible as well as using pollution meta-characters or entities that are eventually ignored by the rendering engine. The attack can result in the execution of otherwise prohibited functionality.", "Likelihood Of Attack": "High", "severity": "High", "condition": "target.sanitizesInput is False or target.validatesInput is False or target.encodesOutput is False", "prerequisites": "Target client software must allow scripting such as JavaScript.", "mitigations": "Design: Use browser technologies that do not allow client side scripting.Design: Utilize strict type, character, and encoding enforcementImplementation: Ensure all content that is delivered to client is sanitized against an acceptable content specification.Implementation: Ensure all content coming from the client is using the same encoding; if not, the server-side application must canonicalize the data before applying any filtering.Implementation: Perform input validation for all remote content, including remote and user-generated contentImplementation: Perform output validation for all remote content.Implementation: Disable scripting languages such as JavaScript in browserImplementation: Patching software. There are many attack vectors for XSS on the client side and the server side. Many vulnerabilities are fixed in service packs for browser, web servers, and plug in technologies, staying current on patch release that deal with XSS countermeasures mitigates this.", "example": "In this example, the attacker tries to get a script executed by the victim's browser. The target application employs regular expressions to make sure no script is being passed through the application to the web page; such a regular expression could be ((?i)script), and the application would replace all matches by this regex by the empty string. An attacker will then create a special payload to bypass this filter: <scriscriptpt>alert(1)</scscriptript> when the applications gets this input string, it will replace all script (case insensitive) by the empty string and the resulting input will be the desired vector by the attacker. In this example, we assume that the application needs to write a particular string in a client-side JavaScript context (e.g., <script>HERE</script>). For the attacker to execute the same payload as in the previous example, he would need to send alert(1) if there was no filtering. The application makes use of the following regular expression as filter ((w+)s*(.*)|alert|eval|function|document) and replaces all matches by the empty string. For example each occurrence of alert(), eval(), foo() or even the string alert would be stripped. An attacker will then create a special payload to bypass this filter: this['al' + 'ert'](1) when the applications gets this input string, it won't replace anything and this piece of JavaScript has exactly the same runtime meaning as alert(1). The attacker could also have used non-alphanumeric XSS vectors to bypass the filter; for example, ($=[$=[]][(__=!$+$)[_=-~-~-~$]+({}+$)[_/_]+($$=($_=!''+$)[_/_]+$_[+$])])()[__[_/_]+__[_+~$]+$_[_]+$$](_/_) would be executed by the JavaScript engine like alert(1) is.", "references": "https://capec.mitre.org/data/definitions/199.html, http://cwe.mitre.org/data/definitions/87.html" },

Specifically I think it's having issues with the * characters following a script element.

As a file with just this content run through pandoc has the same error: <p>, <script>HERE</script> (w+)s\*(.\*)|</p>

Running on MacOS with pandoc version: pandoc 2.13 Compiled with pandoc-types 1.22, texmath 0.12.2, skylighting 0.10.5, citeproc 0.3.0.9, ipynb 0.1.0.1

I appreciate this might be an issue with pandoc but perhaps there is something we can do to escape characters in the description of the threat.

nineinchnick commented 3 years ago

You're right, the description is not properly encoded. Our templating engine is super simple, so we need to figure out if we can add a filter function and call it in the template or does the data passed to the template need to be encoded earlier.

nineinchnick commented 3 years ago

Removing this part from the provided example in docs/template.md allows it to produce report.html:

  <h6>Example Instances</h6>
  <p>{{item.example}}</p>

mikehoyio commented 3 years ago

Thanks for the quick reply! :)

Yes I was also able to produce a report by just removing that script tag

izar commented 3 years ago

I got it fixed by changing <> in with &123; and &125; - is this a change that would be beneficial to make in all <> in the json file ?

nineinchnick commented 3 years ago

But some users might render reports just as markdown files, not HTML. Imho we should just remove these tags from the example for now and replace the template engine in a long term. Or add filter functions.

izar commented 3 years ago

markdown should honor unicode representation. might need a &# instead of a &. I believe that would be the cleanest solution.

nineinchnick commented 3 years ago

So can we call html.escape() when reading the threat file? Keeping unencoded values in the JSON file would make it easier to manage them.

izar commented 3 years ago

Sounds good, let me go try that.

On Wed, Mar 31, 2021 at 6:13 AM Jan Waś @.***> wrote:

So can we call html.escape() when reading the threat file? Keeping unencoded values in the JSON file would make it easier to manage them.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/izar/pytm/issues/147#issuecomment-810949431, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC2BAKNC53IFYOXOYZHYATTGLYVBANCNFSM42CIJCBA .

izar / pytm

Pandoc error when producing example report #147