microformats / php-mf2

php-mf2 is a pure, generic microformats-2 parser for PHP. It makes HTML as easy to consume as JSON.
Creative Commons Zero v1.0 Universal
193 stars 38 forks source link

Try to parse any file with the HTML parser #209

Open Zegnat opened 5 years ago

Zegnat commented 5 years ago

This became clear when inspecting indieweb/indiewebify-me#78.

@sknebel noted that several parsers will actually run on resources like Atom files, even though those are not HTML, and are then able to extract some useful data such as link relationships. php-mf2 does this too, except in cases where it first has to fetch the file from a remote URL.

But: files fetched from remote URLs have to be HTML to be allowed through to parsing. Should we remove this limitation?

Note that it is probably not technically correct to run any of the parser code on non-HTML documents. While Atom happens to include link elements the same way as HTML, that may not be true for all generic XML documents. I am unsure what the actual harm would be. Minimal I expect.

sknebel commented 5 years ago

fetch should probably at least also accept application/xhtml+xml.

snarfed commented 3 months ago

Got a request for this for Bridgy Publish: https://github.com/snarfed/bridgy/issues/1766