ReDOS possible in the sanitize_html function

merbinr commented 1 year ago

There is a security vulnerability in this gem. I tried to communicate with the maintainers in an email, but still haven't got a response, I'm raising the issue here.

Vulnerable Code: https://github.com/jgarber/redcloth/blob/v4.3.2/lib/redcloth/formatters/html.rb#L327

The above code has /<(\/)([A-Za-z]\w)([^>]?)(\s?\/?)>/* regex and it matches user-provided input with the regex pattern, the above regex is vulnerable to ReDOS

Run the test.rb (provided below) with the time ruby test.rb command and observe the time and CPU taken to process it
RedCloth.new(text, [:sanitize_html]).to_html function will trigger the same function, so run the below code and observe the processing time

require 'RedCloth'
text = '<A' + 'A' * (54773)
text = RedCloth.new(text, [:sanitize_html]).to_html

Test Output:

Fix:

Rebuild the regex used as not vulnerable to ReDOS
You can use Recheck to check if a regex is vulnerable to ReDOS

carnil commented 1 year ago

Apparently CVE-2023-31606 is assigned for this issue.

korny commented 1 year ago

@e23e Thanks so much for investigating this! We're using it on our platform, and would like to stay with Textile if possible. Do you have a fix for the issue already? If not, I think I can help with the RegExp, and then we could at least create a fork 🤔

korny commented 1 year ago

Changing the regex to:

    text.gsub!( /<(\/*)(?>[A-Za-z]\w*)([^>]*?)(\s?\/?)>/ ) do |m|

already speeds this up by 30x.

Edit: The version below doesn't work! Use this one instead: https://github.com/jgarber/redcloth/issues/73#issuecomment-1612198953.

Allowing open tags (without a >) fixes the issue completely, but this might change the behavior in unexpected ways:

    text.gsub!( /<(\/*)(?>[A-Za-z]\w*)([^>]*?)(\s?\/?>?)/ ) do |m|
…
        "<#{raw[1]}#{pcs.join " "}#{raw[4]}"

merbinr commented 1 year ago

Hi @korny Thank you for looking into it. I appreciate your thoughts on fixing this. The solution you provided above fixes the issue partially, but I would like to see if we can fix it completely,

 def clean_html( text, allowed_tags = BASIC_TAGS )
     text.gsub!( /<!\[CDATA\[/, '' )
     text.gsub!( /<(\/*)([A-Za-z]\w*)([^>]*?)(\s?\/?)>/ ) do |m|

The clean_html function is used to clean the not allowed tags. I'm not an expert in building regex. I guess this regex detects tags in a string/paragraph. So if you could build a regex without backtracking that detects the tags, we can completely fix this.

Could you please let me know your thoughts on above?

korny commented 1 year ago

After a bit of testing, I'm positive that my first version already fixes the problem for good. The use of Atomic Grouping effectively prevents backtracking in the second group (which matches the tag). The same can be achieved with a possessive quantifier:

    #                              v-- This "+" does the trick
    text.gsub!( /<(\/*)([A-Za-z]\w*+)([^>]*?)(\s?\/?)>/ ) do |m|

In this case, the \w*+ part (and therefore, the whole group ([A-Za-z]\w*+)) will not backtrack. The rest of this regex is not subject to ReDoS (it's linear). So, we can use this version; it should work in Ruby 1.9 and up.

By the way: Ruby 3.2 doesn't even need this patch because it uses some smart caching.

merbinr commented 1 year ago

Hi @korny, I did some testing, and as you said above, it is fixing the ReDOS issue, But the output for the old regex(present one in the gem) and the new regex differ,

I gave below as input.

"<a href=https://example.com> Example </a>"

With the old regex, it returns the below output.

<a href="https://example.com"> Example </a>

But with the new regex, it returns the below output.

<a >href=https://example.com> Example </a>>

> character is added after the tag. Could you please look into it?

I have added attached below the test code I ran

BASIC_TAGS = {
    'a' => ['href', 'title'],
    'img' => ['src', 'alt', 'title'],
    'br' => [],
    'i' => nil,
    'u' => nil, 
    'b' => nil,
    'pre' => nil,
    'kbd' => nil,
    'code' => ['lang'],
    'cite' => nil,
    'strong' => nil,
    'em' => nil,
    'ins' => nil,
    'sup' => nil,
    'sub' => nil,
    'del' => nil,
    'table' => nil,
    'tr' => nil,
    'td' => ['colspan', 'rowspan'],
    'th' => nil,
    'ol' => ['start'],
    'ul' => nil,
    'li' => nil,
    'p' => nil,
    'h1' => nil,
    'h2' => nil,
    'h3' => nil,
    'h4' => nil,
    'h5' => nil,
    'h6' => nil, 
    'blockquote' => ['cite'],
    'notextile' => nil
}

def clean_html( text, regex, allowed_tags = BASIC_TAGS )
    text.gsub!( /<!\[CDATA\[/, '' )
    text.gsub!( regex ) do |m|
      raw = $~
      tag = raw[2].downcase
      if allowed_tags.has_key? tag
        pcs = [tag]
        allowed_tags[tag].each do |prop|
          ['"', "'", ''].each do |q|
            q2 = ( q != '' ? q : '\s' )
            if raw[3] =~ /#{prop}\s*=\s*#{q}([^#{q2}]+)#{q}/i
              attrv = $1
              next if (prop == 'src' or prop == 'href') and not attrv =~ %r{^(http|https|ftp):}
              pcs << "#{prop}=\"#{attrv.gsub('"', '\\"')}\""
              break
            end
          end
        end if allowed_tags[tag]
        "<#{raw[1]}#{pcs.join " "}#{raw[4]}>"
      else # Unauthorized tag
        if block_given?
          yield m
        else
          ''
        end
      end
    end
end

text = "<a href=https://example.com> Example </a>"
old_regex = /<(\/*)([A-Za-z]\w*)([^>]*?)(\s?\/?)>/
new_regex = /<(\/*)([A-Za-z]\w*+)([^>]*?)(\s?\/?>?)/
output = clean_html(text,old_regex)
puts "Output for the currently used regex: ",output
output = clean_html(text,new_regex)
puts "Output for the new proposed regex: ",output

korny commented 1 year ago

I'm sorry, it seems I had a copy/paste error in my comment. The only change I propose is the + in the tag name group. The change at the end (including the > in the last group) was nonsense. I updated the comments.

Here's the PR: https://github.com/jgarber/redcloth/pull/75/files.

Thanks for testing this!

merbinr commented 1 year ago

Hi @korny

It works fine and fixes the ReDOS issue, Thank you for looking into this.

heliocola commented 1 year ago

Thank you @korny and @e23e . I will try to add some tests to cover the scenarios above, merge #75, and release a new version to Rubygems!

heliocola commented 1 year ago

@korny : while working on adding tests to ensure the regexp will not take a long time, I've noticed your proposed regexp fails on Recheck playgound (https://makenowjust-labs.github.io/recheck/playground/).

Here is what I get:

However, on my development laptop, I was able to replicate the time spent between the old regexp and the one you provided in #75 .

Do you have any thoughts?

korny commented 1 year ago

Recheck can’t really read Ruby regexps. It doesn’t support all of the syntax that Ruby provides. Don’t use it to verify Ruby regexps.

heliocola commented 1 year ago

Thank you @korny . I will release a new version to Rubygems soon! 🤞

heliocola commented 1 year ago

RedCloth v4.3.3 has been release to RubyGems! 🎉 THANK YOU @korny and @e23e !

jgarber / redcloth

ReDOS possible in the sanitize_html function #73