microformats / microformats2-parsing

For collecting and handling issues with the microformats2 parsing specification: http://microformats.org/wiki/microformats2-parsing
14 stars 6 forks source link

Unclear interaction between property-microformat collapsing and implied properties #66

Open JKingweb opened 1 year ago

JKingweb commented 1 year ago

The general parsing rules state:

  • if that child element itself has a microformat ("h-*" or backcompat roots) and is a property element, add it into the array of values for that property as a { } structure, add to that { } structure:
    • value:
    • if it's a p-* property element, use the first p-name of the h-* child
    • else if it's an e-* property element, re-use its { } structure with existing value: inside.
    • else if it's a u-* property element and the h-* child has a u-url, use the first such u-url
    • else use the parsed property value per p-*,u-*,dt-* parsing respectively

A strict reading excludes implied name and url (they are not p- or u- properties, technically) despite their being suitable values, such that the parent's name property here has a value of ABBA rather than C as the child does:

<div class="h-parent">
  <div class="p-name h-child">
    <div>
      A<abbr title="C">BB</abbr>A
    </div>
  </div>
</div>

Current parser behaviour:

C: PHP, JavaScript, Go, Rust, Haskell, Ruby ABBA: Python

gRegorLove commented 1 year ago

Since the parser has already recursed and parsed the child element at that point, I wonder if these lines should be changed to use the parsed properties from the child.

This line:

if it's a p- property element, use the first p-name of the h- child

Could become:

if it's a p- property element, use the parsed name property of the h- child

  • If the parsed name property is a { } structure, use its value property
  • Else use the first value in the name array

And so on for the other prefixes.

I think this is what php-mf2 does in practice. I wonder what the other parsers do.

A php-mf2 example with odd usage of e-name to demonstrate the above:

<div class="h-feed">
  <article class="p-x-articles h-entry">
    <h1 class="e-name"><b>Lorem ipsum</b></h1>
  </article>
</div>
"type": [
    "h-feed"
],
"properties": {
    "x-articles": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "name": [
                    {
                        "html": "<b>Lorem ipsum</b>",
                        "value": "Lorem ipsum"
                    }
                ]
            },
            "value": "Lorem ipsum"
        }
    ]
}
JKingweb commented 1 year ago

Since the parser has already recursed and parsed the child element at that point, I wonder if these lines should be changed to use the parsed properties from the child.

This line:

if it's a p- property element, use the first p-name of the h- child

Could become:

if it's a p- property element, use the parsed name property of the h- child

  • If the parsed name property is a { } structure, use its value property
  • Else use the first value in the name array

And so on for the other prefixes.

I think this is what php-mf2 does in practice.

This seems pretty sensible to me, though I think your text is incorrect. I suspect you meant something more like this:

if it's a p- property element and the element's microformat has at least one name property, use the first name property of the h- child as follows:

  • If the first name property is a { } structure, use its value property
  • Else use the first name property as parsed

The language is a bit tortured, unfortunately, but I think it expresses the spirit of your proposal accurately.

I wonder what the other parsers do.

Modifying the test so that it is instead:

<div class="h-feed">
  <article class="p-x-articles h-entry">
    Fall through <h1 class="e-name"><b>Lorem ipsum</b></h1>
  </article>
</div>