ruby / rexml

REXML is an XML toolkit for Ruby
BSD 2-Clause "Simplified" License
137 stars 63 forks source link

Incorrectly calculated `sum` in `#unnormalize` #193

Closed vikiv480 closed 1 month ago

vikiv480 commented 2 months ago

Describe the bug

I'm not completely familiar with this repo so please enlighten me if I'm wrong. I suspect sum is calculated incorrectly in #unnormalize. rv.bytesize is added multiple times over, even for matches that has already been substituted.

https://github.com/ruby/rexml/blob/e3f747fb4fe30f5c890a4bea5b12dd72f595e6b3/lib/rexml/parsers/baseparser.rb#L550-L569


How to reproduce

require "rexml/parsers/baseparser"

entity_less_than = "<"
entitiy_length = 100

filler_text = "A"
filler_length = 100

feed = "#{entity_less_than * entitiy_length}#{filler_text * filler_length}"

base_parser = REXML::Parsers::BaseParser.new("")
base_parser.unnormalize(feed) # => "<" * 100 + "A" * 100

Error:

❯ bundle exec ruby test.rb
/Users/vikiv/.gem/ruby/3.1.2/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:560:in `block in unnormalize': entity expansion has grown too large (RuntimeError)
    from /Users/vikiv/.gem/ruby/3.1.2/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:552:in `each'
    from /Users/vikiv/.gem/ruby/3.1.2/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:552:in `unnormalize'
    from test.rb:13:in `<main>'

Suggestion/fix

diff --git a/lib/rexml/parsers/baseparser.rb b/lib/rexml/parsers/baseparser.rb
index 28810bf..e6d6ae6 100644
--- a/lib/rexml/parsers/baseparser.rb
+++ b/lib/rexml/parsers/baseparser.rb
@@ -549,7 +549,7 @@ module REXML
         matches.collect!{|x|x[0]}.compact!
         if matches.size > 0
           sum = 0
-          matches.each do |entity_reference|
+          matches.uniq.each do |entity_reference|
             unless filter and filter.include?(entity_reference)
               entity_value = entity( entity_reference, entities )
               if entity_value
@@ -557,7 +557,7 @@ module REXML
                 rv.gsub!( re, entity_value )
                 sum += rv.bytesize
                 if sum > Security.entity_expansion_text_limit
-                  raise "entity expansion has grown too large"
+                  # raise "entity expansion has grown too large"
                 end
               else
                 er = DEFAULT_ENTITIES[entity_reference]
@@ -566,6 +566,7 @@ module REXML
             end
           end
           rv.gsub!( Private::DEFAULT_ENTITIES_PATTERNS['amp'], '&' )
+          puts "Sum of entity expansions: #{sum}"
         end
         rv
       end

Result:

# Before change
❯ bundle exec ruby test.rb
Sum of entity expansions: 20000

# After change
❯ bundle exec ruby test.rb
Sum of entity expansions: 200
dstarner commented 2 months ago

We are also encountering the entity expansion has grown too large exception on 3.3.4. Reverting back to 3.3.2 resolved the issue for us.

otegami commented 2 months ago

Below is an example of a problem you might encounter while using REXML::Parsers::BaseParser to parse XML data. If this helps in any way, I'd be delighted.

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "rexml", path: "~/work/ruby/rexml"
end

require 'rexml/parsers/baseparser'

xml = <<-XML
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <e type="test" ver="1">
        Commiter List: &quot;A&quot; of 1 commit, &quot;B&quot; of 2 commits, &quot;C&quot; of 3 commits, &quot;D&quot; of 4 commits, 
        &quot;E&quot; of 5 commits, &quot;F&quot; of 6 commits, &quot;G&quot; of 7 commits, &quot;H&quot; of 8 commits, 
        &quot;I&quot; of 9 commits, &quot;J&quot; of 10 commits, &quot;K&quot; of 11 commits, &quot;L&quot; of 12 commits, 
        &quot;M&quot; of 13 commits, &quot;N&quot; of 14 commits, &quot;O&quot; of 15 commits, &quot;P&quot; of 16 commits, 
        &quot;Q&quot; of 17 commits, &quot;R&quot; of 18 commits, &quot;S&quot; of 19 commits, &quot;T&quot; of 20 commits, 
        &quot;U&quot; of 21 commits, &quot;V&quot; of 22 commits, &quot;W&quot; of 23 commits, &quot;X&quot; of 24 commits, 
        &quot;Y&quot; of 25 commits, &quot;Z&quot; of 26 commits.
    </e>
</root>
XML

parser = REXML::Parsers::BaseParser.new('')
parser.unnormalize(xml)
$ git diff
diff --git a/lib/rexml/parsers/baseparser.rb b/lib/rexml/parsers/baseparser.rb
index 28810bf..0d235ba 100644
--- a/lib/rexml/parsers/baseparser.rb
+++ b/lib/rexml/parsers/baseparser.rb
@@ -549,6 +549,7 @@ module REXML
         matches.collect!{|x|x[0]}.compact!
         if matches.size > 0
           sum = 0
+          p matches
           matches.each do |entity_reference|
             unless filter and filter.include?(entity_reference)
               entity_value = entity( entity_reference, entities )
@@ -556,6 +557,7 @@ module REXML
                 re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
                 rv.gsub!( re, entity_value )
                 sum += rv.bytesize
+                p sum
                 if sum > Security.entity_expansion_text_limit
                   raise "entity expansion has grown too large"
                 end
$ ruby xml_parser.rb
["quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot"]
652
1304
1956
2608
3260
3912
4564
5216
5868
6520
7172
7824
8476
9128
9780
10432
/home/otegami/work/ruby/rexml/lib/rexml/parsers/baseparser.rb:562:in `block in unnormalize': entity expansion has grown too large (RuntimeError)
    from /home/otegami/work/ruby/rexml/lib/rexml/parsers/baseparser.rb:553:in `each'
    from /home/otegami/work/ruby/rexml/lib/rexml/parsers/baseparser.rb:553:in `unnormalize'
    from xml_parser.rb:26:in `<main>'