microformats / microformats2-parsing

For collecting and handling issues with the microformats2 parsing specification: http://microformats.org/wiki/microformats2-parsing
14 stars 6 forks source link

backcompat child of mf2 root #11

Closed sknebel closed 6 years ago

sknebel commented 7 years ago

This question is to clarify what happens when a backcompat class appears as a child of an mf2 root, since both parser behavior and discussions have varying interpretations.

This markup is derived from what I today encountered on https://indieweb.org/events/2017-04-26-homebrew-website-club,

<span class="h-card vcard">
<a href="http://cherryreds.com">
  <span class="p-name fn p-org org">Cherry Red's</span>
</a>, 
<span class="adr">
  <span class="street-address p-street-address">88-92 John Bright St</span>, 
  <span class="p-locality locality">Birmingham</span>, 
  <abbr class="p-country-name country-name">UK</abbr>
</span></span>

spec reading

from the parsing specification:

  • parse a child element for microformats (recurse)
    • if that child element itself has a microformat ("h-*" or backcompat roots) and is a property element, add it into the array of values for that property as a { } structure, add to that { } structure:
      • value:
      • if it's a p- property element, use the first p-name of the h- child
        • else if it's an e-* property element, re-use its { } structure with existing value: inside.
        • else if it's a u- property element and the h- child has a u-url, use the first such u-url
        • else use the parsed property value per p-*, u-*, dt-* parsing respectively
    • else add found elements that are microformats to the "children" array

While the section for properties explicitly calls out microformats as "h-*" or backcompat roots, the one for children does not, but given the wording I'd still interpret it as meaning the same.

Also, the note backward compatibility details section says that backcompat classes are ignored in mf2 and mf2 classes in mf1 are ignored

without an intervening root class name

meaning a backcompat root can change processing to backcompat, and this is not explicitly limited to properties

parser results

Of the 4 main parsers supporting backcompat, there are 3 different results (unrelated bugs have been removed/ignored):

1A: child object

In what would also be my interpretation, mf2py and microformats-ruby recognize the adr class as a backcompat object and, since it doesn't have a property name, turn it into a child of the surrounding h-card:

{
  "items": [
    {
      "type": ["h-card"],
      "properties": {
        "name": ["Cherry Red's"],
        "org": ["Cherry Red's"],
        "url": ["http://cherryreds.com"]
      },
      "children": [
        {
        "type": ["h-adr"],
          "properties": {
            "street-address": ["88-92 John Bright St"],
            "locality": ["Birmingham"],
            "country-name": ["UK"]
          }
        }
      ]}]}

1B: ignore adr

php-mf2 ignores the adr and consequently parses the mf2 properties inside it as properties of the surrounding h-card (which is in this case the result the original author probably intended)

{
  "items": [
    {
      "type": ["h-card"],
      "properties": {
        "name": ["Cherry Red's"],
        "org": ["Cherry Red's"],
        "url": ["http://cherryreds.com"],
        "street-address": ["88-92 John Bright St"],
        "locality": ["Birmingham"],
        "country-name": ["UK"]
        }}]}

1C: apply backcompat rules =>as if "p-adr h-adr"

microformats-shiv (or more specifically, microformats-node based on it) seems to apply the backcompat rules for vcard.adr, turning the adr into the equivalent of p-adr h-adr. This IMHO clearly shouldn't be applied to a MF2 root like h-card, but there is an example in the test suite requiring this behavior(?!).

{
  "items": [
    {
      "type": ["h-card"],
      "properties": {
        "name": ["Cherry Red's"],
        "org": ["Cherry Red's"],
        "url": ["http://cherryreds.com"],
        "adr": [
          {
            "value": "88-92 John Bright St, \n   Birmingham, \n   UK", 
            "type": ["h-adr"],
            "properties": {
              "street-address": ["88-92 John Bright St"],
              "locality": ["Birmingham"], "country-name":["UK"]
            }
          }]
        }}]}

(Side note: If the markup is changed from class="adr" to class="p-adr adr" (backcompat property value instead of child), microformats-shiv, mf2py and microformats-ruby produce the same result as 1C, whereas mf2-php flattens the "adr" property to a string)

Related material

In this issue, @aaronpk says:

Based on the mf2 wiki, I'm not totally clear whether this should be ignored or not: http://microformats.org/wiki/microformats2-parsing#note_backward_compatibility_details However, it was surprising to me that it's there, and I don't want it to show up in the mf2 parsed version.

and @gRegorLove seems to say that nested mf1 should be ignored.

Zegnat commented 7 years ago

My interpretation matched that of sknebel so I would lean towards 1A.

1B seems clearly wrong as the adr class is indicative of the old adr and has backwards compatibility parsing defined by h-adr.

I get where 1C is coming from but I feel its definition is confusing. If this rule should exist because adr was often used as a direct child of a different object without property definition (as this distinction wasn’t made in mf1) I would love to see the specification making this more explicit. Something like:

If a backwards compatible adr object is found as a child (not property) of an h-card or h-event, do not add it to children and instead add it as p-adr.


Also see the discussion between @sknebel and myself on IRC.

gRegorLove commented 7 years ago

adr (and geo) is a confusing one I always have to spend some time wrapping my head around when it comes to backcompat. My understanding is that it is always a root mf1 class, and sometimes a property of vcard

So adr appearing inside an h-card should be ignored as an mf1 property, but it should be backcompat parsed as an mf1 root, per http://microformats.org/wiki/h-adr#Backward_Compatibility, which would make it a child object.

So I think 1A is correct as the spec is worded now. 1B appealed to me at first since it's closer to what the publisher intended, but I think it's reasonable for publishers that are adding/updating to mf2 to be expected to update vcard => h-card and adr => p-adr h-adr so their intent is clear. I'd prefer that over the suggested backcompat exception.

Other discussions: https://chat.indieweb.org/microformats/2017-05-01#t1493664172920000 https://chat.indieweb.org/microformats/2017-03-06#t1488831469687000

sknebel commented 6 years ago

We just encountered this difference in handling between mf2py and php-mf2 while trying to help debug Bridgy on a WordPress site with the following structure:

<body class="h-entry">
  <div id="page" class="hfeed site wrap">
    <h1 class="entry-title"><span class='p-name'>title</span></h1>
    other content
    <div class="entry-content">
      <div class="e-content">this is a test for indieweb post </div> <span class="syn-text">Also on:</span>
<!--syndication links -->
    </div>
  </div>
</body>

Again, completely ignoring the backcompat hfeed leads to the result from an mf2 parser the author intended (so 1B above), while mf2py turned the hfeed into a backcompat root (it handled it's properties wrongly, but that is an unrelated issue), following 1A and what IMHO is closest to the wording here. In mf2py the h-entry thus had no properties (since they were inside the hfeed) and the parser fell back to implied-property handling, causing the entire page content, menus and all, to appear in the name property and being published to Twitter.

1A results in something like (note no properties on the feed, due to there being no mf1 properties, something mf2py currently does wrong)

{"items": [
{"type": ["h-entry"],
"properties": {
  "name": ["titleother content\nthis is a test for indieweb post Also on:"]
},
"children": [
  {
    "type": ["h-feed"],
    "properties": {}

1B results in

"items": [
 {
   "type": ["h-entry"],
   "properties": {
      "name": ["title"],
      "content": [
        {
          "html": "this is a test for indieweb post ",
          "value": "this is a test for indieweb post"
        }
      ]

EDIT: added examples for outputs.

sknebel commented 6 years ago

Again, 1B (ignoring mf1 inside mf2 roots completely) leads to the more usable result.

But as laid out in the first post does not seem to match the spec language, which presumably is there for a reason: Can anyone involved in writing the backcompat rules clarify the thoughts behind allowing mf1 nested in mf2? I could not find anything on the old issues pages. I guess that's @tantek @glennjones?

I guess this turns into 2 questions:

  1. confirm that the current spec means A?
  2. if it does, is that what we want despite scenarios where B produced the "better" output (counter-examples?), or should the spec adjust to follow php-mf2 here?
kartikprabhu commented 6 years ago

The new mf2py https://github.com/kartikprabhu/mf2py/ should consistently do 1A according to current spec. Checked on above examples.

gRegorLove commented 6 years ago

Confirming php-mf2 master branch does 1A now as well. Note the second example doesn't imply the h-entry name due to https://github.com/microformats/microformats2-parsing/issues/6

{
    "items": [
        {
            "type": [
                "h-card"
            ],
            "properties": {
                "name": [
                    "Cherry Red's"
                ],
                "org": [
                    "Cherry Red's"
                ],
                "url": [
                    "http://cherryreds.com"
                ]
            },
            "children": [
                {
                    "type": [
                        "h-adr"
                    ],
                    "properties": {
                        "street-address": [
                            "88-92 John Bright St"
                        ],
                        "locality": [
                            "Birmingham"
                        ],
                        "country-name": [
                            "UK"
                        ]
                    }
                }
            ]
        }
    ],
    "rels": {},
    "rel-urls": {},
    "debug": {
        "package": "https://packagist.org/packages/mf2/mf2",
        "version": "dev-master",
        "note": [
            "This output was generated from the php-mf2 library available at https://github.com/indieweb/php-mf2",
            "Please file any issues with the parser at https://github.com/indieweb/php-mf2/issues"
        ]
    }
}
{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": [],
            "children": [
                {
                    "type": [
                        "h-feed"
                    ],
                    "properties": []
                }
            ]
        }
    ],
    "rels": {},
    "rel-urls": {},
    "debug": {
        "package": "https://packagist.org/packages/mf2/mf2",
        "version": "dev-master",
        "note": [
            "This output was generated from the php-mf2 library available at https://github.com/indieweb/php-mf2",
            "Please file any issues with the parser at https://github.com/indieweb/php-mf2/issues"
        ]
    }
}
sknebel commented 6 years ago

Thanks for the updates @gRegorLove and @kartikprabhu!

Since everyone seems on the same page about what the spec says, we have active parser implementers confirming that their parsers match, the wrong test case has been fixed, and a bug against the JS parser has been filed, I'll close this as is.

Thanks everyone!