Open yorkz1994 opened 1 month ago
I think the failed test is because we did the line end normalization after the unescape, but unescape can contain \r
, so these new \r
s should not be normalized. We have to do the normalization only for data in the not unescape parts.
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 60.36%. Comparing base (
3ebe221
) to head (d3ee1ad
). Report is 3 commits behind head on master.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Good. But I would also like to see the tests when use API of
Attribute
andBytesText
, which are listed here.
Sorry, don't know what do you want me to do. I am not familiar with XML specification on attributes. What I did is just do normalization of the line end before your further unescape work. So it removes \r
before unescape means it is just like no such character in your input at all, so your previous logic should continue to work. Now all your previous cases passed and the added case for testing normalization also works. I don't see anything I can do.
I like just see tests that will call unescape
family of methods on Attribute
and BytesText
which is get from the reader and check that the result does not contain \r
. Something like:
// XML with \n \r\n and \r style newlines in various places
const XML: &str = "...";
let mut reader = Reader::from_str(XML);
match reader.read_event().unwrap() {
Event::Start(event) => {
let iter = event.attributes();
let a = iter.next().unwrap();
#[cfg(not(feature = "encoding"))]
assert_eq!(a.unescape_value(), "...");
assert_eq!(a.decode_and_unescape_value(), "...");
}
event => panic!("Expected Start, found {:?}", event),
}
match reader.read_event().unwrap() {
Event::Text(event) => assert_eq!(event.unescape(), "..."),
event => panic!("Expected Text, found {:?}", event),
}
Like I said, I am not familiar with XML specification, so I don't know where is the proper places to put \r
in it and what should be expected after unescape if one appears. For example is it valid to put a \r
in a attribute key or value, in tag name? If you know better than me how about you try it yourself.
Just add it everywhere where spaces can occur, we are not not talking about correctness for now (this is another question, we are definitely do not process everything according to the specification). I only want to have a starting point and be sure that this feature worked as assumed when you would use actual API.
Could you provide such input data. I don't want to create such invalid XML. Normally I can think is that the \r
will only appear in BytesText
due to line end diffs in OS.
<root attribute="\r\r\n\nvalue1\r\r\n\nvalue2\r\r\n\n">\r\r\n\nvalue3\r\r\n\nvalue4\r\r\n\n</root>
f970370 This commit is due to forgot to normalize if input has nothing to unescape.
I add a case from your input, don't know the result is your expected:
#[test]
fn line_ends() {
const XML: &str = "<root attribute=\"\r\r\n\nvalue1\r\r\n\nvalue2\r\r\n\n\">\r\r\n\nvalue3\r\r\n\nvalue4\r\r\n\n</root>";
let mut reader = Reader::from_str(XML);
match reader.read_event().unwrap() {
Event::Start(event) => {
let mut iter = event.attributes();
let a = iter.next().unwrap().unwrap();
#[cfg(not(feature = "encoding"))]
assert_eq!(
a.unescape_value().unwrap(),
"\n\n\nvalue1\n\n\nvalue2\n\n\n"
);
assert_eq!(
a.decode_and_unescape_value(reader.decoder()).unwrap(),
"\n\n\nvalue1\n\n\nvalue2\n\n\n"
);
}
event => panic!("Expected Start, found {:?}", event),
}
match reader.read_event().unwrap() {
Event::Text(event) => {
assert_eq!(event.unescape().unwrap(), "\n\n\nvalue3\n\n\nvalue4\n\n\n")
}
event => panic!("Expected Text, found {:?}", event),
}
}
There is a case in serde-se.rs always fail. Don't know if we should modify serde implementation?
serialize_as!(tuple:
// Use to_string() to get owned type that is required for deserialization
("<\"&'>".to_string(), "with\t\r\n spaces", 3usize)
=> "<root><\"&'></root>\
<root>with\t\r\n spaces</root>\
<root>3</root>");
---- with_root::tuple stdout ----
thread 'with_root::tuple' panicked at tests/serde-se.rs:1956:5:
deserialization roundtrip: Custom("invalid type: string \"with\\t\\n spaces\", expected a borrowed string")
The failed test is because there is \r
in the roundtrip test. It is impossible to have equal serialize and deserialize due to line end normalization. Therefore I have to remove the \r
in roundtrip test to get the case pass.
I did more line ends normalization. Also changed the normalization implementation to use iterator to avoid extra allocation during normalization.
Regarding #806 I added a
normalize_line_end
function inescape
module and related tests. Ifunescape
function is called, then line end will be normalized.