sporkmonger / addressable

Addressable is an alternative implementation to the URI implementation that is part of Ruby's standard library. It is flexible, offers heuristic parsing, and additionally provides extensive support for IRIs and URI templates.
Apache License 2.0
1.56k stars 265 forks source link

ReDOS when using a variable path and query during extract #364

Open ebenoist opened 4 years ago

ebenoist commented 4 years ago

Template Used

Addressable::Template.new("{scheme}://{host}{/path*}{?query*}")

When the template above extracts a url with an empty query string, the regex used displays immense performance degradation.

template.extract("http://host.com/abcdefghijklmnopqrstuvwxyz?a")

All the time spent (~12 seconds on a modern Macbook Pro) is spent attempting to match the uri with the generated expansion regex.

Bad Regex

/(?-mix:^(?:|(?<scheme>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?):\/\/(?:|(?<host>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?)(?:|\/(?<path>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?(?:\/?(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)*)?)(?:|\?(?<query>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*=(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?(?:&?(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*=(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)*)?)$)/

Reproduction

require "addressable"
require "benchmark"

puts "*" * 30
puts "path with expansion - {scheme}://{host}{/path*}{?query*}"
puts "*" * 30
template = Addressable::Template.new("{scheme}://{host}{/path*}{?query*}")

puts "with one part"
puts Benchmark.measure {
  puts template.extract("http://host.com/abcdefghijklmnopqrstuvwxyz")
}

puts "with one part and empty query - reDDOS"
puts Benchmark.measure {
  puts template.extract("http://host.com/abcdefghijklmnopqrstuvwxyz?a")
}

puts "expand with one part"
puts Benchmark.measure {
  puts template.expand({
    scheme: "https",
    host: "host.com",
    path: "abcdefghijklmnopqrstuvwxyz",
  })
}

puts "expand with one part and empty query"
puts Benchmark.measure {
  puts template.expand({
    scheme: "https",
    host: "host.com",
    path: "abcdefghijklmnopqrstuvwxyz",
    query: { a: nil },
  })
}

# ❥ ruby benchmark.rb
# With one part
#   0.000383   0.000027   0.000410 (  0.000407)
# With one part and empty query
#  12.595742   0.006677  12.602419 ( 12.612499)
# Expand with one part
#   0.000219   0.000003   0.000222 (  0.000223)
# Expand with one part and empty query
#   0.000177   0.000011   0.000188 (  0.000188)

puts "\n\n\n\n"
template = Addressable::Template.new("{scheme}://{host}{/path}{?query*}")
puts "*" * 30
puts "path without expansion - {scheme}://{host}{/path}{?query*}"
puts "*" * 30

puts "with one part"
puts Benchmark.measure {
  puts template.extract("http://host.com/abcdefghijklmnopqrstuvwxyz")
}

puts "with one part and empty query"
puts Benchmark.measure {
  puts template.extract("http://host.com/abcdefghijklmnopqrstuvwxyz?a")
}

puts "expand with one part"
puts Benchmark.measure {
  puts template.expand({
    scheme: "https",
    host: "host.com",
    path: "abcdefghijklmnopqrstuvwxyz",
  })
}

puts "expand with one part and empty query"
puts Benchmark.measure {
  puts template.expand({
    scheme: "https",
    host: "host.com",
    path: "abcdefghijklmnopqrstuvwxyz",
    query: { a: nil },
  })
}

# ❥ ruby benchmark.rb
# With one part
#   0.000383   0.000027   0.000410 (  0.000407)
# With one part and empty query
#  12.595742   0.006677  12.602419 ( 12.612499)
# Expand with one part
#   0.000219   0.000003   0.000222 (  0.000223)
# Expand with one part and empty query
#   0.000177   0.000011   0.000188 (  0.000188)

##
# When using the variable path the following regexp is generated
# EXPANSION_REGEXP_WITH_VARIABLE_PATH = /(?-mix:^(?:|(?<scheme>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?):\/\/(?:|(?<host>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?)(?:|\/(?<path>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?(?:\/?(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)*)?)(?:|\?(?<query>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*=(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?(?:&?(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*=(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)*)?)$)/
#
# When we don't use a variable path the following regexp is generated
# EXPANSION_REGEXP_WITHOUT_VARIABLE_PATH = /(?-mix:^(?:|(?<scheme>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?):\/\/(?:|(?<host>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?)(?:|\/(?<path>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?)(?:|\?(?<query>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*=(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?(?:&?(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*=(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)*)?)$)/

puts "\n\n\n\n"

EXPANSION_REGEXP_WITH_VARIABLE_PATH = /(?-mix:^(?:|(?<scheme>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?):\/\/(?:|(?<host>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?)(?:|\/(?<path>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?(?:\/?(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)*)?)(?:|\?(?<query>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*=(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?(?:&?(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*=(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)*)?)$)/
EXPANSION_REGEXP_WITHOUT_VARIABLE_PATH = /(?-mix:^(?:|(?<scheme>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?):\/\/(?:|(?<host>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?)(?:|\/(?<path>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)?)(?:|\?(?<query>(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*=(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?(?:&?(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*=(?:[a-zA-Z0-9\-\.\_\~]|%[a-fA-F0-9][a-fA-F0-9])*?)*)?)$)/

puts "bad regexp"
puts Benchmark.measure {
  puts "http://host.com/abcdefghijklmnopqrstuvwxyz?a".match(EXPANSION_REGEXP_WITH_VARIABLE_PATH)
}

puts "\n"

puts "better regexp"
puts Benchmark.measure {
  puts "http://host.com/abcdefghijklmnopqrstuvwxyz?a".match(EXPANSION_REGEXP_WITHOUT_VARIABLE_PATH)
}

# bad regexp
#
#  12.920860   0.016003  12.936863 ( 12.948862)
#
# better regexp
#
#   0.000008   0.000002   0.000010 (  0.000010)

Thank you @joshkrueger for helping isolate the issue.

olofheurgren commented 3 years ago

I noticed the same thing when extracting a URL with a fragment (#) part where the Template does not include a fragment. Execution time seems to grow exponentially as the path length increases.

EDIT: I saw in CHANGELOG.md that version 2.8.0 fixes a ReDoS vulnerability. However, this issue still occurs on 2.8.0.

Demonstration:

require 'addressable/template'
require 'benchmark'

template = Addressable::Template.new('http://www.example.com{/path*}')

(15..30).each do |path_length|
  path = "a"*path_length
  extract_time = Benchmark.realtime do
    template.extract("http://www.example.com/#{path}#fragment")
  end 
  puts "path length: #{path_length}, extract time: #{'%.2f' % extract_time}s"
end

output:

path length: 15, extract time: 0.01s
path length: 16, extract time: 0.01s
path length: 17, extract time: 0.02s
path length: 18, extract time: 0.03s
path length: 19, extract time: 0.07s
path length: 20, extract time: 0.13s
path length: 21, extract time: 0.26s
path length: 22, extract time: 0.52s
path length: 23, extract time: 1.03s
path length: 24, extract time: 2.07s
path length: 25, extract time: 4.15s
path length: 26, extract time: 8.34s
path length: 27, extract time: 16.49s
path length: 28, extract time: 33.17s
path length: 29, extract time: 75.10s
path length: 30, extract time: 142.61s
ebenoist commented 3 years ago

Yup, saw the CVE reported today and figured it was this issue. We've removed the need to pass untrusted URIs to this library, but I can also confirm that the issue still exists.