Closed merbinr closed 1 year ago
Apparently CVE-2023-31606 is assigned for this issue.
@e23e Thanks so much for investigating this! We're using it on our platform, and would like to stay with Textile if possible. Do you have a fix for the issue already? If not, I think I can help with the RegExp, and then we could at least create a fork 🤔
Changing the regex to:
text.gsub!( /<(\/*)(?>[A-Za-z]\w*)([^>]*?)(\s?\/?)>/ ) do |m|
already speeds this up by 30x.
Edit: The version below doesn't work! Use this one instead: https://github.com/jgarber/redcloth/issues/73#issuecomment-1612198953.
Allowing open tags (without a >
) fixes the issue completely, but this might change the behavior in unexpected ways:
text.gsub!( /<(\/*)(?>[A-Za-z]\w*)([^>]*?)(\s?\/?>?)/ ) do |m|
…
"<#{raw[1]}#{pcs.join " "}#{raw[4]}"
Hi @korny Thank you for looking into it. I appreciate your thoughts on fixing this. The solution you provided above fixes the issue partially, but I would like to see if we can fix it completely,
def clean_html( text, allowed_tags = BASIC_TAGS )
text.gsub!( /<!\[CDATA\[/, '' )
text.gsub!( /<(\/*)([A-Za-z]\w*)([^>]*?)(\s?\/?)>/ ) do |m|
The clean_html
function is used to clean the not allowed tags. I'm not an expert in building regex. I guess this regex detects tags in a string/paragraph. So if you could build a regex without backtracking
that detects the tags, we can completely fix this.
Could you please let me know your thoughts on above?
After a bit of testing, I'm positive that my first version already fixes the problem for good. The use of Atomic Grouping effectively prevents backtracking in the second group (which matches the tag). The same can be achieved with a possessive quantifier:
# v-- This "+" does the trick
text.gsub!( /<(\/*)([A-Za-z]\w*+)([^>]*?)(\s?\/?)>/ ) do |m|
In this case, the \w*+
part (and therefore, the whole group ([A-Za-z]\w*+)
) will not backtrack. The rest of this regex is not subject to ReDoS (it's linear). So, we can use this version; it should work in Ruby 1.9 and up.
By the way: Ruby 3.2 doesn't even need this patch because it uses some smart caching.
Hi @korny, I did some testing, and as you said above, it is fixing the ReDOS issue, But the output for the old regex(present one in the gem) and the new regex differ,
I gave below as input.
"<a href=https://example.com> Example </a>"
With the old regex, it returns the below output.
<a href="https://example.com"> Example </a>
But with the new regex, it returns the below output.
<a >href=https://example.com> Example </a>>
>
character is added after the tag. Could you please look into it?
I have added attached below the test code I ran
BASIC_TAGS = {
'a' => ['href', 'title'],
'img' => ['src', 'alt', 'title'],
'br' => [],
'i' => nil,
'u' => nil,
'b' => nil,
'pre' => nil,
'kbd' => nil,
'code' => ['lang'],
'cite' => nil,
'strong' => nil,
'em' => nil,
'ins' => nil,
'sup' => nil,
'sub' => nil,
'del' => nil,
'table' => nil,
'tr' => nil,
'td' => ['colspan', 'rowspan'],
'th' => nil,
'ol' => ['start'],
'ul' => nil,
'li' => nil,
'p' => nil,
'h1' => nil,
'h2' => nil,
'h3' => nil,
'h4' => nil,
'h5' => nil,
'h6' => nil,
'blockquote' => ['cite'],
'notextile' => nil
}
def clean_html( text, regex, allowed_tags = BASIC_TAGS )
text.gsub!( /<!\[CDATA\[/, '' )
text.gsub!( regex ) do |m|
raw = $~
tag = raw[2].downcase
if allowed_tags.has_key? tag
pcs = [tag]
allowed_tags[tag].each do |prop|
['"', "'", ''].each do |q|
q2 = ( q != '' ? q : '\s' )
if raw[3] =~ /#{prop}\s*=\s*#{q}([^#{q2}]+)#{q}/i
attrv = $1
next if (prop == 'src' or prop == 'href') and not attrv =~ %r{^(http|https|ftp):}
pcs << "#{prop}=\"#{attrv.gsub('"', '\\"')}\""
break
end
end
end if allowed_tags[tag]
"<#{raw[1]}#{pcs.join " "}#{raw[4]}>"
else # Unauthorized tag
if block_given?
yield m
else
''
end
end
end
end
text = "<a href=https://example.com> Example </a>"
old_regex = /<(\/*)([A-Za-z]\w*)([^>]*?)(\s?\/?)>/
new_regex = /<(\/*)([A-Za-z]\w*+)([^>]*?)(\s?\/?>?)/
output = clean_html(text,old_regex)
puts "Output for the currently used regex: ",output
output = clean_html(text,new_regex)
puts "Output for the new proposed regex: ",output
I'm sorry, it seems I had a copy/paste error in my comment. The only change I propose is the +
in the tag name group. The change at the end (including the >
in the last group) was nonsense. I updated the comments.
Here's the PR: https://github.com/jgarber/redcloth/pull/75/files.
Thanks for testing this!
Hi @korny
It works fine and fixes the ReDOS issue, Thank you for looking into this.
Thank you @korny and @e23e . I will try to add some tests to cover the scenarios above, merge #75, and release a new version to Rubygems!
@korny : while working on adding tests to ensure the regexp will not take a long time, I've noticed your proposed regexp fails on Recheck playgound (https://makenowjust-labs.github.io/recheck/playground/).
Here is what I get:
However, on my development laptop, I was able to replicate the time spent between the old regexp and the one you provided in #75 .
Do you have any thoughts?
Recheck can’t really read Ruby regexps. It doesn’t support all of the syntax that Ruby provides. Don’t use it to verify Ruby regexps.
Thank you @korny . I will release a new version to Rubygems soon! 🤞
RedCloth v4.3.3 has been release to RubyGems! 🎉 THANK YOU @korny and @e23e !
There is a security vulnerability in this gem. I tried to communicate with the maintainers in an email, but still haven't got a response, I'm raising the issue here.
Vulnerable Code: https://github.com/jgarber/redcloth/blob/v4.3.2/lib/redcloth/formatters/html.rb#L327
The above code has /<(\/)([A-Za-z]\w)([^>]?)(\s?\/?)>/* regex and it matches user-provided input with the regex pattern, the above regex is vulnerable to ReDOS
time ruby test.rb
command and observe the time and CPU taken to process itTest Output:
Fix: