rust-syndication / rss

Library for serializing the RSS web content syndication format
https://crates.io/crates/rss
Apache License 2.0
419 stars 52 forks source link

Extensions not recognized when namespaces are declared inline #154

Closed SSheldon closed 1 year ago

SSheldon commented 1 year ago

The following fails:

let input = r#"
<?xml version="1.0" encoding="UTF-8"?>
<rss>
    <channel>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Creator</dc:creator>
    </channel>
</rss>
"#;
let channel = input.parse::<Channel>().unwrap();

assert!(channel.dublin_core_ext().is_some());
assert_eq!(
    channel.dublin_core_ext().unwrap().creators,
    vec!["Creator"]
);

I have observed RSS like this in the wild; for example, Rock Paper Shotgun's feed.

My searching suggests that it is valid xml to declare a namespace in the same tag where it is used.

This looks nontrivial to fix, because currently the namespace map is only populated with namespaces declared on the root rss element: https://github.com/rust-syndication/rss/blob/69336ea07a2f26094d4e14dbd34f948605a5496f/src/channel.rs#L1049-L1056

SSheldon commented 1 year ago

Heh, complicating things: I think it's technically valid to redefine a namespace prefix to a new namespace:

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
    <channel>
        <itunes:creator xmlns:itunes="http://purl.org/dc/elements/1.1/">Creator</itunes:creator>
    </channel>
</rss>

(I have never seen rss like this and it would have to be a cruel joke.)

The fact that ExtensionMap is keyed by the namespace prefix rather than the namespace name makes this a bit trickier to resolve, because technically elements could be using the same namespace prefix but different namespace names.