Open cristianstaicu opened 7 years ago
anchor the regex
Could you please elaborate on the fix, I'm curious. Thank you!
Sorry, that was a standard comment I sent around. In this case it would not work! The anchoring solution is well suited when the regex is used for validation purposes. Let us say you want to check an input string is an email address. Instead of having a simple regex like:
/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]/
you should have something like:
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]$/
Word bounadries can also be used as explained here: https://www.regular-expressions.info/email.html
This will decrease the complexity of the matching since the engine will only try to match the regex once, not n (length of input string) times like in the case of the first regex.
Thanks for the explanation!
I hope this gets fixed soon, it's giving my package red lights :(
This issue now seems to being picked up by an nsp
check making our builds red too. See https://nodesecurity.io/advisories/536
The slowdown is moderately low (for 50,000 characters around 2 seconds matching time).
Just wondering why it is a problem? For example, the string "underscore" algorithm could be a very complex one and may need lots of seconds to process a 50,000 characters long string.
I recommend the following two articles for understanding the issue: https://blog.liftsecurity.io/2014/11/03/regular-expression-dos-and-node.js/ https://dl.acm.org/citation.cfm?id=3065916
Basically, the problem comes from the fact that the regex matching is done in the main loop of Node.js and thus blocking this loop for 2 seconds is equivalent to blocking all the incoming requests for 2 seconds.
Thanks for the info. The example in the first link is a classic evil regex and it can be written in a more efficient way. var emailExpression = /^([a-zA-Z0-9_.-])+\@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/;
The regex used in the "underscore" case is fine: /([A-Z\d]+)([A-Z][a-z])/g , which may not incur much performance penalty compared to an alternative implementation. I understand the impact of slowing regex matching to an application. My point is that if a regular expression is used correctly, then it is not a problem to be slow.
For posterity, Snyk also reports it: https://snyk.io/vuln/npm:string:20170907
I'm looking forward to solution. The html-stripping features, for example, at the competition are very poor and I can't find the alternative for String.js
..
Hey there! I haven't looked too closely at this issue but it was linked by @revelt above. I'd definitely suggest using the striptags
library for removing HTML. It is based on the PHP function strip_tags
, which is widely known to be safe.
In the README I link to two articles, one indicating the regular expressions are not safe for removing dangerous HTML (here), and the second demonstrating that PHP's strip_tags
function is not vulnerable (here).
https://github.com/ericnorris/striptags take a look to see if that works for you - it uses a state machine very similar to the one that is in the PHP internals instead of a regular expression. As a result it is super fast and just as secure!
This is a critical issue. The author and the contributors don't seem to care at all. Kill this project with fire.
@steve come on, it's not that bad :) team is just regrouping, I'm sure guys will fix that eventually
@jprichardson, string.js is a nice library and I would prefer to continue using it; however, such a vulnerability isn't something that can be used in production. Are you aware of this issue or has the project been abandoned as the last update was about 11 months ago?
@revelt, I can appreciate the concerns raised by @steve and @yieme correctly points out lack of update activity for almost a year. This vulnerability has been live for over a month. At what point should we consider "eventually" unacceptable?
@nconf-base I think now, because if you have string.js
in production, your code is vulnerable now. Personally, I'm going to switch to alternatives asap and equally come back asap when it's fixed.
As far as my stuff is concerned, I already replaced every string.js
methods with alternatives on my libraries except I can't find a good quality, nothing-assumed tool to strip HTML tags from strings. Within October I plan to release the alternative, string-only, no-regex library which strips HTML, which will untie me completely from string.js
.
I'm thinking, it can't be that all the methods of string.js
are affected, can it? If some are OK, maybe we could split string.js
into separate libs like lodash
does with lodash.*
and so on? This way, affected methods/libraries would be isolated and it would be easier to pinpoint the culprits. You know, divide and conquer approach...
@revelt, I think you are correct the issue only affects a couple of the methods, specifically underscore
and unescapeHTML
, as per https://nodesecurity.io/advisories/536
I think your idea of decomposing the larger string.js library into smaller pieces might help those who use the library without those particular methods. Splitting out the problem methods on a fork might be a start.
How does your strip tags method differ from https://github.com/ericnorris/striptags that eric pointed out?
@nconf-base it's assuming too much what results in bunch of false positives. I just scratched the surface with that issue, many of my unit tests fail because of striptags
if I switch. Same with other similarly-named libraries from the first npm results page, I raised issues on at least two others.
For the record, string.js
copes fine stripping HTML from stuff like aaa<bbb
.
@revelt, I'm not sure I would agree that striptags
results in "false positives". Any function that ostensibly strips HTML must assume all input is HTML, and as such must remove tags, incomplete or not.
Leaving aaa<bbb
may result in unsafe HTML if it is concatenated with stuff on the page. A more concrete example is if it is fed aaa<script src="evil.js"
. The tag is incomplete, but when placed on a page (or concatenated with other safe HTML) may end up producing valid HTML that results in evil.js
executing. That's not acceptable for a library responsible for removing HTML.
If the input is "safe", then <
should be HTML encoded, e.g. aaa<bbb
.
@nconf-base I would strongly suggest using striptags
! I am, of course, biased, but it is based on the very popular function from PHP (http://php.net/manual/en/function.strip-tags.php) and is battle-tested.
@ericnorris I see your point. Basically, there are two kinds of HTML-stripping cases and we both are proponents of each.
My case is Detergent.js, email text copy cleaning app - HTML can come in the input, there is high risk of false positives, but zero chance of rogue code being executed and code if does come, it comes unescaped. This way, code a < b and c > d
is very likely and legit (consider even b
as valid tag name).
Your case is a regular web-dev ops where you clean HTML in order to defuse potentially malicious code. Prioritising security over risk of false positives. All code assumed to be escaped and unescaped-one a no-no.
String.js
was this middle-ground solution, being able to detect aaa<bbb
correctly, yet, at the same time, reliably strip malicious HTML for years in everybody's apps. It tended both sides. I'd gladly use striptags
if it tended both sides as well.
Do you see my point?
@revelt that's not tending both sides. <
is never safe in HTML and under some legacy parsing modes I think the browser may even auto-complete what it thinks is an incorrectly truncated tag with >
. Any library that leaves any <
inside something to be embedded into HTML is simply unsafe to use for HTML output, plain and simple. (since <
outside of a HTML tag is never valid)
Edit: If you refer to the use case of stripping first and then replacing any remaining <
and >
afterwards with <
and >
anyway, I suppose that's fair enough.
@JonasT Yes, you are right, it's not safe and not valid in HTML. But... Detergent accepts both HTML and pure text and outputs text with sprinkled HTML or without. There is a thin path in here where >
is valid both as input and output. For example, if people want to remove invisible characters from some text copy. Like a < b and c > d\u0003
(invisible ETX character in the end). That's a legit request and Detergent will do that. I want to produce a < b and c > d
, not a d\u0003
or a d
. Do you see my point?
Since user pastes the copy into Detergent, input is assumed to be safe, so I see no problems letting them output unencoded <
if they wish (encoding is turned on by default, but it can be turned off).
Having said that, I'm about to publish my type of HTML stripping tool which behaves like that.
@az7arul Are you currently maintaining this or who is in charge here?
Hi @StorytellerCZ, I wasn't maintaining this for a while. But for this I can take a look this weekend, meanwhile any PR that address this is welcome.
// This function converts a binary string (can be HTML or JavaScript) to a safe string that can be used in a HTML file within JavaScript tags. function EncodeHTML(S) { var i, C; S = S.split(""); for (i = 0; i < S.length; i++) { C = S[i].charCodeAt(0); if (C == 7) S[i] = "\t"; else if (C == 10) S[i] = "\r"; else if (C == 13) S[i] = "\n"; else if (C == 92) S[i] = "\\"; else if (C == 34 || C == 38 || C == 39 || C == 60 || C == 62 || C < 32 || C > 126) S[i] = "\x" + toHex(C); } return '"' + S.join("") + '"'; }
// This function converts a Js safe string with quotation marks around it to its original source. It's the opposite of EncodeHTML() function DecodeHTML(S) { S += ""; if (S.length > 1) { if (S.charAt(0) == '"' || S.charAt(0) == "'") S = S.slice(1, -1); } S = S.split("\t").join("\t").split("\r").join("\r").split("\n").join("\n").split("\\").join("\").split("\x"); for (var i = 1; i < S.length; i++) S[i] = String.fromCharCode(parseInt(S[i].substr(0, 2), 16)) + S[i].slice(2); return S.join(""); }
Oops. I just noticed that the above code didn't come out right. I should have put that code within CODE section, not as plain text, because the double backslash characters have been replaced with single. :/
Hello, any news about this ? Thanks
@az7arul, thank you for looking into this. We have nsp check
as part of our deploy process and this security issue breaks the build.
I looked into this briefly and used https://github.com/jagracey/RegEx-DoS to scan for other vulnerabilities and more were reported by the scanner. I am afraid I have no quick fix for this right now.
I don't have a timeframe when this is going to be solved though.
Hi all,
This should fix the 1st vulnerability:
function underscore (str) {
var s = str.trim().replace(/\s*([A-Z])/g, function (_, m) {
return '_' + m.toLowerCase()
})
return s
}
let s = underscore('oneAtATime AnotherWordAtATime')
assert.equal(s, 'one_at_a_time_another_word_at_a_time')
The 2nd one should be gone with limiting the the number of chars an entity might have.
To my knowledge <blockquote>
or <figcaption>
are the longest HTML elements.
/\&([^;]+);/ --> /&([^;]{1,20});/
BTW there is no need to escape on &
@cristianstaicu et all would be nice if you could review these thoughts? Thanks.
Is there any alternative package for this library.? We are using few functions mostly humanize. We are resolving security concern to the our project.
@ankurloriya No, the all-in-one monolithic replacement doesn't exist.
But you can replace its methods one-by-one with libraries from npm. For example, I replaced replaceAll()
and with replace-string, lines()
with split-lines. There were no suitable HTML tag stripping libraries to replace stripTags()
so I coded my own. If your unit tests are thorough they will pick up any inconsistencies (or their absence) of a replacement library. I found it difficult to judge replacements by their readmes, - my unit tests were the judge.
Performance-wise, switching from monolithic string.js
to separate libs will reduce your bundle's footprint too. You might even get some of them as Rollup'ed 3-in-1 builds today, where your Webpack/Rollup will tap their ES module's build directly. On other hand, it will take time for string.js
to migrate to Rollup, it doesn't have an ES Module build at the moment.
We changed to using various other libraries due to this security issue. Appears the project may be abandoned with #218 the http://stringjs.com/ domain apparently going away. Sad, it was a good library to have a lot of useful tools in one place. Hopefully it gets turned around for people. We just couldn't wait any longer for the security issue to be resolved. Cheers.
The following regular expressions used in underscore and unescapeHTML methods are vulnerable to ReDoS:
The slowdown is moderately low (for 50,000 characters around 2 seconds matching time). I would suggest one of the following:
If needed, I can provide an actual example showing the slowdown.