tafia / quick-xml

Rust high performance xml reader and writer
MIT License
1.18k stars 235 forks source link

Disabling `check_end_names` is not effective since `v0.32` #770

Closed torokati44 closed 3 months ago

torokati44 commented 3 months ago

With v0.31, given this code:

use quick_xml::events::Event;
use quick_xml::reader::Reader;

fn main() {
    let xml = "<b>a<i>b</b>c</i>d</b>e";

    let mut reader = Reader::from_str(xml);
    reader.check_end_names(false);

    let mut buf = Vec::new();

    loop {
        match reader.read_event_into(&mut buf) {
            Ok(Event::Eof) => break,
            Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
            Ok(e) => println!("{:?}", e),
        }
    }
}

The output is this:

Start(BytesStart { buf: Borrowed("b"), name_len: 1 })
Text(BytesText { content: Borrowed("a") })
Start(BytesStart { buf: Borrowed("i"), name_len: 1 })
Text(BytesText { content: Borrowed("b") })
End(BytesEnd { name: Borrowed("b") })
Text(BytesText { content: Borrowed("c") })
End(BytesEnd { name: Borrowed("i") })
Text(BytesText { content: Borrowed("d") })
End(BytesEnd { name: Borrowed("b") })
Text(BytesText { content: Borrowed("e") })

Starting with v0.32, with this slightly adapted code:

use quick_xml::events::Event;
use quick_xml::reader::Reader;

fn main() {
    let xml = "<b>a<i>b</b>c</i>d</b>e";

    let mut reader = Reader::from_str(xml);
    reader.config_mut().check_end_names = false;

    let mut buf = Vec::new();

    loop {
        match reader.read_event_into(&mut buf) {
            Ok(Event::Eof) => break,
            Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
            Ok(e) => println!("{:?}", e),
        }
    }
}

The output is:

Start(BytesStart { buf: Borrowed("b"), name_len: 1 })
Text(BytesText { content: Borrowed("a") })
Start(BytesStart { buf: Borrowed("i"), name_len: 1 })
Text(BytesText { content: Borrowed("b") })
End(BytesEnd { name: Borrowed("b") })
Text(BytesText { content: Borrowed("c") })
End(BytesEnd { name: Borrowed("i") })
Text(BytesText { content: Borrowed("d") })
thread 'main' panicked at src/main.rs:15:23:
Error at position 22: IllFormed(UnmatchedEndTag("b"))
torokati44 commented 3 months ago

This causes test failures for us here: https://github.com/ruffle-rs/ruffle/pull/16774

torokati44 commented 3 months ago

Regression introduced in https://github.com/tafia/quick-xml/pull/675 (https://github.com/tafia/quick-xml/commit/a4febadb63ed100fff0f00a479a126b89426032a).

danielhjacobs commented 3 months ago

Looking at that commit, that looks to be a purposeful change. As has happened for us before, the issue seems to be that Adobe Flash Player, and especially AVM1, is particularly lenient with what it allows in XML, and we need to match that behavior. Maybe @Mingun can add a separate check that we could disable to get the old behavior back.

torokati44 commented 3 months ago

FWIW "not allowed even in HTML" is not the strongest of arguments. As seen above, there are lots of "XML-ish" formats in the wild, not just HTML. As long as it's a somewhat comprehensible sequence of XML "events", I don't see why having a dangling close tag shouldn't be reported as is, like it was before.

Mingun commented 3 months ago

Yes, I think, new config option should solve this problem. Feel free to make a PR.

Don't you think to watch the project more closely and review PRs (I do not have rights to manage the list of official reviewers from whom I can request a review in GitHub UI, although)? Reviews from users with real life projects could be helpful.

I have some plans to increase reader compliance in the near future and the first step is #766.