Try to parse any file with the HTML parser

Zegnat commented 5 years ago

This became clear when inspecting indieweb/indiewebify-me#78.

@sknebel noted that several parsers will actually run on resources like Atom files, even though those are not HTML, and are then able to extract some useful data such as link relationships. php-mf2 does this too, except in cases where it first has to fetch the file from a remote URL.

But: files fetched from remote URLs have to be HTML to be allowed through to parsing. Should we remove this limitation?

Note that it is probably not technically correct to run any of the parser code on non-HTML documents. While Atom happens to include link elements the same way as HTML, that may not be true for all generic XML documents. I am unsure what the actual harm would be. Minimal I expect.

sknebel commented 5 years ago

fetch should probably at least also accept application/xhtml+xml.

snarfed commented 3 months ago

Got a request for this for Bridgy Publish: https://github.com/snarfed/bridgy/issues/1766

microformats / php-mf2

Try to parse any file with the HTML parser #209