Closed acuifex closed 1 year ago
This is an interesting bug which I did not encounter two years ago.
Here's a minimal reproducible example:
use std::io::BufReader;
use xml::{
reader::{EventReader, XmlEvent},
ParserConfig,
};
fn main() -> std::io::Result<()> {
let data = r#"<root>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
<item><![CDATA[]]></item>
</root>"#;
let config = ParserConfig::new().cdata_to_characters(true);
let reader = BufReader::new(data.as_bytes());
let parser = EventReader::new_with_config(reader, config);
let mut depth = 0;
for e in parser {
match e {
Ok(XmlEvent::StartElement { name, .. }) => {
println!("{:spaces$}+{name}", "", spaces = depth * 2);
depth += 1;
}
Ok(XmlEvent::EndElement { name }) => {
depth -= 1;
println!("{:spaces$}-{name}", "", spaces = depth * 2);
}
Err(e) => {
eprintln!("Error: {e}");
break;
}
_ => {}
}
}
Ok(())
}
Turns out that the panic occurs after exactly 14 empty CDATA values with cdata_to_characters
enabled. I really wonder why that is...
I bisected now all older versions where it still worked and found the commit that introduced the regression: https://github.com/netvl/xml-rs/commit/444a7c2535d362cc71d9461ffb36d3197fab3100
Not sure what this change does but the hard-coded 16 value there is very close to the magic number 14 which makes the event reader panic.
This issue has always been happening, but instead of being caught, it was silently causing incorrect line numbers and ever-increasing memory usage.
The max depth for push_pos should be finite, because it's a stack of XML syntax constructs being parsed (like element has an attribute, attribute has an entity, and then the XML syntax doesn't go any more detailed than that).
Hi!
I'm doing what the message told me. Note that I'm not a rust dev and i have no idea what I'm doing.
I'm messing around with this project: https://github.com/NeKzor/lp serde-xml-rs has a similar issue: https://github.com/RReverser/serde-xml-rs/issues/205
Package versions pulled from
Cargo.lock
: serde-xml-rs: 0.6.0 xml-rs: 0.8.15url from where the xml was pulled xml file itself