When parsing <!DOCTYPE []> from a BufRead, there is a bug where it thinks a closing tag '>' that occurs as part of the doctype instead closes the DOCTYPE. This occurs when the entire doctype does not fit inside the bufread's buffer.
Because of that, when a closing tag '>' is mistakenly parsed as end of doctype, the content after that is parsed incorrectly.
Steps to reproduce
use quick_xml::events::Event;
use quick_xml::Reader;
use std::io::BufReader;
fn main() {
let xml = r#"<!DOCTYPE X [<!-- comment --><!ENTITY a "a">]>"#;
let reader = BufReader::with_capacity(4, xml.as_bytes());
let mut xml_reader = Reader::from_reader(reader);
let mut buf = vec![];
loop {
match xml_reader.read_event_into(&mut buf).unwrap() {
Event::Eof => break,
_ => {}
}
}
println!("Parsed xml without error");
}
Above code throws an error Syntax(InvalidBangMarkup). This is because the closing tag for comment <!-- comment --> is mistakenly parsed as closing doctype, and so trying to parse <!ENTITY> throws an error.
However, things work fine if we increase the bufreader capacity to greater than text content (e.g. 1024).
Description
When parsing
<!DOCTYPE []>
from a BufRead, there is a bug where it thinks a closing tag'>'
that occurs as part of the doctype instead closes the DOCTYPE. This occurs when the entire doctype does not fit inside the bufread's buffer.Because of that, when a closing tag '>' is mistakenly parsed as end of doctype, the content after that is parsed incorrectly.
Steps to reproduce
Above code throws an error
Syntax(InvalidBangMarkup)
. This is because the closing tag for comment<!-- comment -->
is mistakenly parsed as closing doctype, and so trying to parse<!ENTITY>
throws an error.However, things work fine if we increase the bufreader capacity to greater than text content (e.g. 1024).
I'll be opening a PR for the fix right away