tafia / quick-xml

Rust high performance xml reader and writer
MIT License
1.22k stars 238 forks source link

Error when parsing !DOCTYPE longer than buffer of BufRead #801

Closed BlueGreenMagick closed 2 months ago

BlueGreenMagick commented 2 months ago

Description

When parsing <!DOCTYPE []> from a BufRead, there is a bug where it thinks a closing tag '>' that occurs as part of the doctype instead closes the DOCTYPE. This occurs when the entire doctype does not fit inside the bufread's buffer.

Because of that, when a closing tag '>' is mistakenly parsed as end of doctype, the content after that is parsed incorrectly.

Steps to reproduce

use quick_xml::events::Event;
use quick_xml::Reader;
use std::io::BufReader;

fn main() {
    let xml = r#"<!DOCTYPE X [<!-- comment --><!ENTITY a "a">]>"#;
    let reader = BufReader::with_capacity(4, xml.as_bytes());
    let mut xml_reader = Reader::from_reader(reader);
    let mut buf = vec![];
    loop {
        match xml_reader.read_event_into(&mut buf).unwrap() {
            Event::Eof => break,
            _ => {}
        }
    }
    println!("Parsed xml without error");
}

Above code throws an error Syntax(InvalidBangMarkup). This is because the closing tag for comment <!-- comment --> is mistakenly parsed as closing doctype, and so trying to parse <!ENTITY> throws an error.

However, things work fine if we increase the bufreader capacity to greater than text content (e.g. 1024).

I'll be opening a PR for the fix right away

Mingun commented 2 months ago

Yes, that is a known bug #590. I close this as duplicate.