microformats / microformats2-parsing

For collecting and handling issues with the microformats2 parsing specification: http://microformats.org/wiki/microformats2-parsing
14 stars 6 forks source link

parse HTML id attribute #44

Closed sknebel closed 5 years ago

sknebel commented 5 years ago

In a few places, being able to consume the HTML id attribute would be useful.

use cases

  1. to be able to consume fragment links to identify the relevant microformats object
  2. For following pages with multiple feeds, it's necessary to find the same feed again, while the page author should be free to move elements around on the page
    • feature requested e.g. by @dshanske

output format

I'd propose a new 'id' attribute on the microformats object (not a property) i.e.

<div class="h-feed" id="updates">
<a class="u-author h-card" href="https://example.com">Max Mustermann</a>
<li class="h-entry">[...]</li>
[...]

would produce output like

{
    "items": [
        {
            "type": [ "h-feed"],
            "id": "updates",      <------------------
            "properties": {
                "author": ...
            },
            "children": [
                {
                    "type": [
                        "h-entry"
                    ],
                    ...
}

This format should be completely backwards compatible.

imply uid?

In the discussion in IRC and in https://github.com/microformats/php-mf2/issues/206, it was also proposed to automatically imply a uid property based on the document URL and the id as a fragment.

I don't think this is a good idea for a few reasons:

sknebel commented 5 years ago

spec change proposal

Extend http://microformats.org/wiki/microformats2-parsing#parse_a_document_for_microformats with the new last bullet point:

  • else if found, start parsing a new microformat
    • keep track of whether the root class name(s) was from backcompat
    • create a new { } structure with:
      • type: [array of unique microformat "h-*" type(s) on the element sorted alphabetically],
      • properties: { } - to be filled in when that element itself is parsed for microformats properties
      • if the element has a non-empty HTML id property: id: string value of the HTML id attribute of the element

EDIT: text clarified that id has to be non-empty (it being empty isn't valid HTML anyways).

gRegorLove commented 5 years ago

Sounds like good reasoning and a reasonable spec update. I'm in favor and can implement in php-mf2 pretty easily.

dshanske commented 5 years ago

As a user of the php-mf2 parser in my Parse This library, I would find this useful.

jalcine commented 5 years ago

This could help out quite a bit with the Elixir implementation of Microformats2. I do see the potential issue with using u-uid and have been opting to use u-uid in Koype but this would make things more explicit (which is better).

dshanske commented 5 years ago

I implemented some changes to my post-processing of parser output to take the id now in the PHP-MF2 master branch and use it to create a url with fragment for each feed, which allowed me to individually enumerate the feeds. That will assist me in letting them be parsed as individual elements should someone request a specific feed.

tantek commented 5 years ago

Resolution: proposal accepted.

No objections in above discussion, and positive opinions (👍) from a few implementors on the proposal.

Proposal implementations in mf2py and phpmf2 parsers, and https://github.com/dshanske verification that phpmf2 implementation satisfies use-case for the issue is sufficient to demonstrate implementability and utility, all as noted/linked in issue thread.

Editing specification accordingly.

(Originally published at: http://tantek.com/2018/364/t3/)