(i.e., a document with a single tag, whose contents end at offset 8192) will cause quick-xml to emit only a Start and End event, with no Text event in the middle. This can be verified with the following rust program, which simply dumps all received events to stderr:
use quick_xml::events::Event;
use quick_xml::reader::Reader;
fn main() {
let mut reader = Reader::from_file("test.xml").unwrap();
let mut buf = Vec::new();
loop {
let ev = reader.read_event_into(&mut buf);
if matches!(ev, Ok(Event::Eof)) { break; }
dbg!(&ev);
}
}
The root cause seems to be in the read_text of buffered_reader.rs. In this case, reading the text data requires two loop iterations. On the first iteration, < is not found in the data, so the entire data is pushed onto buf and the loop continues. On the second iteration, < is found at position 0, which triggers the special case in the code that returns ReadTextResult::Markup instead of ReadTextResult::UpToMarkup, and does not indicate that text data was also present.
I think the right fix here would be to only check the zero-position case on the first iteration of the loop.
Perhaps the title isn't quite clear here. As a concrete example, the document written by this python script:
(i.e., a document with a single tag, whose contents end at offset 8192) will cause quick-xml to emit only a Start and End event, with no Text event in the middle. This can be verified with the following rust program, which simply dumps all received events to stderr:
This prints:
The root cause seems to be in the
read_text
ofbuffered_reader.rs
. In this case, reading the text data requires two loop iterations. On the first iteration,<
is not found in the data, so the entire data is pushed ontobuf
and the loop continues. On the second iteration,<
is found at position 0, which triggers the special case in the code that returnsReadTextResult::Markup
instead ofReadTextResult::UpToMarkup
, and does not indicate that text data was also present.I think the right fix here would be to only check the zero-position case on the first iteration of the loop.