Closed rrix closed 1 year ago
Yeah, my understanding is that any invalid XML in a feed should be either escaped as you say, or stuffed in a <![CDATA[]]>
so that the parser skips over it.
Having tried your feed in the w3c validator it actually fails for all sorts of reasons: https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Farcology.garden%2Fupdates.xml, starting at duplicate id
attribute in some entries.
If something is good enough for the W3C validator but failing in hookshot I'd consider that a bug – but this is clearly invalid XML, RSS standards aside.
Yeah, I think given we're seeing very low rates of parsing failures in hookshot (about 0.401%, well within the margin of error) I think we're likely relaxed enough about our parsing. I think if W3C starts saying your feed is valid @rrix and hookshot still complains, we can poke at it.
it's HTTP 200 Okay, not HTTP 200 Perfect, but okay lol
i'll just go back to running my own bot which parses the feeds just fine, but thanks for looking :)
@Half-Shot tried to resolve this in 61f25fae3611b753cb7ac07a1f2f41942780b83d but this unfurled in to more trouble.
I have a feed which does not parse:
That line is a youtube copy-pasta embed name of
<iframe width="560" height="315" src="https://www.youtube.com/embed/V9vQ06n7HX4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-- it doesn't like the allowfullscreen attributeW3C feed validator says it's more-or-less valid and while i could probably coerce pandoc in to spitting out xhtml for the feeds, i'd rather not. Meanwhile, a close reading of RFC4287 s3.1.1.2 would reveal that this feed isn't actually valid and that I should be escaping the
<
elements so they escape the parser.I'll do that, maybe this weekend, but this feed worked fine with go-neb so it caught me a bit by surprise.
In general I think feed readers should behave more like browsers in trying to handle parsing and presentation errors with the goal of having a document to present even if it's not "valid"
cheers