This became clear when inspecting indieweb/indiewebify-me#78.
@sknebel noted that several parsers will actually run on resources like Atom files, even though those are not HTML, and are then able to extract some useful data such as link relationships. php-mf2 does this too, except in cases where it first has to fetch the file from a remote URL.
But: files fetched from remote URLs have to be HTML to be allowed through to parsing. Should we remove this limitation?
Note that it is probably not technically correct to run any of the parser code on non-HTML documents. While Atom happens to include link elements the same way as HTML, that may not be true for all generic XML documents. I am unsure what the actual harm would be. Minimal I expect.
This became clear when inspecting indieweb/indiewebify-me#78.
@sknebel noted that several parsers will actually run on resources like Atom files, even though those are not HTML, and are then able to extract some useful data such as link relationships. php-mf2 does this too, except in cases where it first has to fetch the file from a remote URL.
But: files fetched from remote URLs have to be HTML to be allowed through to parsing. Should we remove this limitation?
Note that it is probably not technically correct to run any of the parser code on non-HTML documents. While Atom happens to include
link
elements the same way as HTML, that may not be true for all generic XML documents. I am unsure what the actual harm would be. Minimal I expect.