showdownjs / showdown

A bidirectional Markdown to HTML to Markdown converter written in Javascript
http://www.showdownjs.com/
MIT License
14.26k stars 1.56k forks source link

Possible false positive reported by CodeQL static code analyser in release 2.0.0 #888

Closed elrido closed 2 years ago

elrido commented 2 years ago

When updating to showdown 2.0.0 in another project, the CodeQL static code analyser used there reported an issue with the following line: https://github.com/showdownjs/showdown/blob/32a1aaa39b39cff6d9a85b070d64db6829b79b26/src/subParsers/makeMarkdown/txt.js#L15-L19 The report states that:

This does not escape backslash characters in the input.

Sanitizing untrusted input is a common technique for preventing injection attacks such as SQL injection or cross-site scripting. Usually, this is done by escaping meta-characters such as quotes in a domain-specific way so that they are treated as normal characters.

However, directly using the string replace method to perform escaping is notoriously error-prone. Common mistakes include only replacing the first occurrence of a meta-character, or backslash-escaping various meta-characters but not the backslash itself.

In the former case, later meta-characters are left undisturbed and can be used to subvert the sanitization. In the latter case, preceding a meta-character with a backslash leads to the backslash being escaped, but the meta-character appearing un-escaped, which again makes the sanitization ineffective.

Even if the escaped string is not used in a security-critical context, incomplete escaping may still have undesirable effects, such as badly rendered or confusing output.

It goes on to recommend to "Use a (well-tested) sanitization library if at all possible.", which showdown is, IMHO.

I'm not sure if this really is an issue or a false positive by an overzealous tool and wanted to get a second, human opinion.

rugk commented 2 years ago

More information about CodeQL: https://codeql.github.com/ (You can set it up on GitHub and it is really nice.) More information on that warning: https://codeql.github.com/codeql-query-help/javascript/js-incomplete-sanitization/

tivie commented 2 years ago

This block of code is not for sanitization or security purposes. This is a step in converting HTML to markdown, to prevent creation of markdown by mistake.

For instance, take this input:

<p>nickname: _randomNickname_</p>

The md equivalent would be:

nickname: \_randomNickname\_

backslashes are added since a word wrapped in underscores means emphasis in markdown. This block of code does this

elrido commented 2 years ago

Thank you for your time and assessment. Wouldn't markdown need to escape backslashes (legal in HTML), tough? i.e.:


<p>file path: C:\random\windows\path</p> -> file path: C:\\random\\windows\\path
tivie commented 2 years ago

No. Backslashes are literal in markdown, except when preceding the following characters:

\   backslash
`   backtick
*  asterisk
_   underscore
{}  curly braces
[]  square brackets
()  parentheses
#   hash mark
+   plus sign
-   minus sign (hyphen)
.   dot
!   exclamation mark

So, in your example, writing C:\\random\\windows\\path and C:\random\windows\path is functionally the same thing, regarding output.

We could escape them, but weird stuff might arise from it. (although I will look into it).

tivie commented 2 years ago

Regarding your question, it seems to me that this is indeed a false positive.

elrido commented 2 years ago

Yeah, the docs are a bit vague on that point, backslash is listed as something that can be escaped, but it is probably a bit of undefined behavior how different rendering engines might handle encountering a backslash not followed by one of the characters listed - most would probably just ignore it.

I could only imagine this to be a risk in edge cases, if several HTML snippets get converted to markdown and finally assembled and it so happens to change the output when rendered as a whole.

Given all this and that the project I work on only converts whole markdown documents to HTML and not the other way around or backwards and forwards I think it is not a concern for our use case.

Thank you again for the feedback!