Closed vikiv480 closed 1 month ago
We are also encountering the entity expansion has grown too large
exception on 3.3.4. Reverting back to 3.3.2 resolved the issue for us.
Below is an example of a problem you might encounter while using REXML::Parsers::BaseParser
to parse XML data. If this helps in any way, I'd be delighted.
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "rexml", path: "~/work/ruby/rexml"
end
require 'rexml/parsers/baseparser'
xml = <<-XML
<?xml version="1.0" encoding="UTF-8"?>
<root>
<e type="test" ver="1">
Commiter List: "A" of 1 commit, "B" of 2 commits, "C" of 3 commits, "D" of 4 commits,
"E" of 5 commits, "F" of 6 commits, "G" of 7 commits, "H" of 8 commits,
"I" of 9 commits, "J" of 10 commits, "K" of 11 commits, "L" of 12 commits,
"M" of 13 commits, "N" of 14 commits, "O" of 15 commits, "P" of 16 commits,
"Q" of 17 commits, "R" of 18 commits, "S" of 19 commits, "T" of 20 commits,
"U" of 21 commits, "V" of 22 commits, "W" of 23 commits, "X" of 24 commits,
"Y" of 25 commits, "Z" of 26 commits.
</e>
</root>
XML
parser = REXML::Parsers::BaseParser.new('')
parser.unnormalize(xml)
$ git diff
diff --git a/lib/rexml/parsers/baseparser.rb b/lib/rexml/parsers/baseparser.rb
index 28810bf..0d235ba 100644
--- a/lib/rexml/parsers/baseparser.rb
+++ b/lib/rexml/parsers/baseparser.rb
@@ -549,6 +549,7 @@ module REXML
matches.collect!{|x|x[0]}.compact!
if matches.size > 0
sum = 0
+ p matches
matches.each do |entity_reference|
unless filter and filter.include?(entity_reference)
entity_value = entity( entity_reference, entities )
@@ -556,6 +557,7 @@ module REXML
re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
rv.gsub!( re, entity_value )
sum += rv.bytesize
+ p sum
if sum > Security.entity_expansion_text_limit
raise "entity expansion has grown too large"
end
$ ruby xml_parser.rb
["quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot"]
652
1304
1956
2608
3260
3912
4564
5216
5868
6520
7172
7824
8476
9128
9780
10432
/home/otegami/work/ruby/rexml/lib/rexml/parsers/baseparser.rb:562:in `block in unnormalize': entity expansion has grown too large (RuntimeError)
from /home/otegami/work/ruby/rexml/lib/rexml/parsers/baseparser.rb:553:in `each'
from /home/otegami/work/ruby/rexml/lib/rexml/parsers/baseparser.rb:553:in `unnormalize'
from xml_parser.rb:26:in `<main>'
Describe the bug
I'm not completely familiar with this repo so please enlighten me if I'm wrong. I suspect
sum
is calculated incorrectly in#unnormalize
.rv.bytesize
is added multiple times over, even for matches that has already been substituted.https://github.com/ruby/rexml/blob/e3f747fb4fe30f5c890a4bea5b12dd72f595e6b3/lib/rexml/parsers/baseparser.rb#L550-L569
How to reproduce
Error:
Suggestion/fix
Result: