Closed tantek closed 6 years ago
From the end of the wiki discussion, one straw proposal was:
"any explicit p-* property on an element stops implied p-name"
(this sounds a bit ambiguous and could be reworded, but I think the general intent / principle is workable)
I'm not clear on what that proposal actually means. Does "stop" mean no implied name is generated at all? Does it mean that property with p-* is excluded from the implied name?
Either possibility could be pursued. Let's start documenting examples of existing failures / misbehaviors so we can look at specifics.
Here is an example that came up today: https://huffduffer.com/tags/indieweb
This boils down to a structure like below:
<ol class="h-feed">
<li class="h-entry">
<h3 class="p-name">Episode Name</h3>
<div class="e-content">episode description...</div>
<audio class="u-audio" src="..."></audio>
<p class="p-author h-card">...</p>
</li>
</ol>
The result of the current implied p-name rule is all the contents of the individual h-entrys end up being in the implied p-name of the h-feed.
Would it make sense for implied p-name to skip content from h-* children? It would mean in the above case that the implied p-name for the above h-feed is empty.
Restating for clarity, and adding children as another way to stop implying p-name
"p-name MUST NOT be implied if there are any explicit p-* properties or any nested microformats"
Need consensus positive feedback from parser developers and one proof of implementation to proceed (and no objections obviously). Yes you can thumbs-up to indicate positive feedback 😃
Here's a Bridgy example: https://brid-gy.appspot.com/comment/twitter/miklb/948601132397588481/949306079623766016
{
"items": [
{
"type": [
"h-entry"
],
"properties": {
"author": [
{
"type": [
"h-card"
],
"properties": {
"uid": [
"tag:twitter.com,2013:deaaqua"
],
"numeric-id": [
"26367081"
],
"name": [
"Amanda Rocksalot"
],
"nickname": [
"deaaqua"
],
"url": [
"https://www.instagram.com/deaaqua/",
"https://twitter.com/deaaqua"
],
"photo": [
"https://pbs.twimg.com/profile_images/1811120958/Buttercups_and_Books_by_pinkparis1233.jpg"
]
},
"value": "Amanda Rocksalot"
}
],
"category": [
{
"type": [
"h-card"
],
"properties": {
"uid": [
"tag:twitter.com,2013:GatoIshwary"
],
"name": [
"Gato Ishwary"
],
"url": [
"https://twitter.com/GatoIshwary"
]
},
"value": "https://twitter.com/GatoIshwary"
},
{
"type": [
"h-card"
],
"properties": {
"uid": [
"tag:twitter.com,2013:miklb"
],
"name": [
"Michael Bishop"
],
"url": [
"https://twitter.com/miklb",
"https://miklb.com/"
]
},
"value": "https://twitter.com/miklb"
}
],
"uid": [
"tag:twitter.com,2013:949306079623766016"
],
"url": [
"https://twitter.com/deaaqua/status/949306079623766016"
],
"video": [
"https://video.twimg.com/tweet_video/DSyc1j3VQAswKgR.mp4"
],
"in-reply-to": [
"https://twitter.com/GatoIshwary/status/949250988946395137",
"https://miklb.com/2018/01/3012/"
],
"published": [
"2018-01-05T15:46:21+00:00"
],
"name": [
"tag:twitter.com,2013:949306079623766016\n \n 2018-01-05T15:46:21+00:00\n Amanda Rocksalot\n\n deaaqua\n \n\n https://twitter.com/deaaqua/status/949306079623766016\n \n \n \n \nYour browser does not support the video tag. Click here to view directly. \n Gato Ishwary\n \n \n \n\n \n Michael Bishop"
]
}
}
],
"rels": {},
"debug": {
"package": "https://packagist.org/packages/mf2/mf2",
"version": "v0.3.2",
"note": [
"This output was generated from the php-mf2 library available at https://github.com/indieweb/php-mf2",
"Please file any issues with the parser at https://github.com/indieweb/php-mf2/issues"
]
}
}
I encountered a similar case today working on https://github.com/aaronpk/XRay/issues/52 where a photo
property seemed to come out of nowhere.
In this example, I would not expect the photo
property to be implied:
<div class="h-entry"><p class="e-content p-name">Hello World <img src="example.jpg"></p></div>
From the authoring perspective, I would expect the act of defining the e-content
class should prevent the img
tag inside from turning into an implied property on the parent h-entry
.
The implied photo
in @aaronpk’s example is done through the rule:
- else if
.h-x>:only-child:not[.h-*]>img[src]:only-of-type:not[.h-*]
, then use thatimg
’ssrc
for photo
This rule looks pretty complex. It seems to exist mostly in case of some wrapper element being inserted between the root h-x
and the data we want to use for extracting implied values. That’s what I assume to be the original use case because the in-between element has to be the only element within the root h-x
.
Either the existence of explicit properties should stop the implied parsing, or the rules should incorporate some form of limitation in case of the wrapper element. Adding :not[.p-*]:not[.u-*]:not[.dt-*]:not[.e-*]
to the :only-child
in-between element selector would be an option here, but I personally don’t like making these implied values rules even bigger and more convoluted. They are already a source of many questions.
Just a thought I just had in chat when I realised e-content
wasn’t going to stop a name
property to be implied per @tantek’s proposed new limitation:
"p-name MUST NOT be implied if there are any explicit p-* properties or any nested microformats"
Is there any reason why we limit this to p-
properties only and not all explicit properties? I still would not be able to remove the redundant p-name
from my blog posts. Here is my HTML:
<article class="h-entry">
<main class="e-content p-name">
<p>Post</p>
</main>
<footer>
<p>Published on <time class="dt-published" datetime="2017-11-07T13:27:49+01:00">7<sup>th</sup> November 2017 13:27:49</time>.</p>
<p class="editorlink">[<a rel="edit" href="#">edit</a>] [<a href="#" class="u-url u-uid">permalink</a>]</p>
</footer>
</article>
This h-entry
includes an e-
property, a dt-
property, and two u-
properties. So without p-name
there are no other p-
properties to stop the name from being implied.
Tantek’s home page is another example of h-entry
markup that will need to keep explicitly setting p-name
because it would get implied wrong otherwise:
<li class="h-entry hentry as-note">
<p class="p-name entry-title e-content entry-content article">Post</p>
<span class="info footer"><a href="#" class="dt-published published dt-updated updated u-url u-uid"><time class="value" datetime="05:11-0800">05:11</time> on <time class="value">2018-01-14</time></a></span>
</li>
It (like mine) has several other properties defined for a post but no additional p-
properties.
Today I spotted another live example of a post where the parsed name
will not be fixed by the proposed change. Hat tip to @gRegorLove, who linked to Upon 2020 first. See chat. The post [he linked to]() includes a p-in-reply-to
and will indeed be fixed by the proposed change. But…
@jernst’s website uses the following HTML for note posts (edited to remove non-mf2 classes and emty unrelated attributes for easier reading):
<li id="" class="h-entry">
<a href="" title="Posts by jernst ( @jernst )" class="">
<img alt='' src='' srcset='' class='' height='48' width='48' />
</a>
<h4>
<a href="" title="Posts by jernst ( @jernst )">jernst</a>
<span class="">
<abbr class="dt-published" title="2018-01-24T16:56:18Z">
08:56 <em>on</em> January 24, 2018
</abbr>
<span class="">
<a href="" class="u-url" title="Permalink">Permalink</a>
<a rel="nofollow" class="" href="">Log in to leave a Comment</a>
</span>
<span class=""> </span>
</span>
</h4>
<div id="" class="e-content">
<p>
My “invited guest post” (is that a thing?) on opportunities
of the #IndieWeb for small businesses went live on GoDaddy’s site:
<a href="" rel="nofollow">https://www.godaddy.com/garage/indieweb-facebook-opportunities/</a>
</p>
</div>
⋮
</li>
This has a u-url
, dt-published
, and e-content
. Even under the current proposal, as there is no other p-
property, the name
will be implied and include meta data. (As shown by Loqi).
Also note that if the reply post linked by @gRegorLove had used u-in-reply-to
it wouldn’t be fixed either. This is a great example of why I do not think we should limit it to p-
properties. That post would not be expressing any difference in intent between u-in-reply-to
and p-in-reply-to
but the parsers will assign it an implied name in one case and not for the other. I think this will lead to extra confusion.
Currently all mf2 items have a name property, because of implied name. If we change this, some code using the parsers could fail if it assumes a name is present. Not sure if this a key issue.
Documenting from yesterday’s chat, because nobody could remember this and I can’t find it elsewhere:
So I think the (as of now) latest proposed spec change would be:
p-name
MUST NOT be implied if there are any explicit p-*
or e-*
properties, or any nested microformats.Currently all mf2 items have a name property, because of implied name. If we change this, some code using the parsers could fail if it assumes a name is present.
Here too I will just document an answer given in chat, this time by @aaronpk:
may just mean it has to be released as a major version number [of the parsers]
Assuming something like semver is being used, any major version bump should signify possible API changes to the user. I too don’t think that would be an issue.
There might be an issue if someone is using parsers-as-a-service, e.g. always getting their mf2 parser output from php.microformats.io
. But I don’t think anyone ever advertised their online parsers as a service?
I (and likely others) use xray.p3k.io as a service, so I will have to consider what to do in that case. It doesn't return the Microformats JSON, it converts it to its jf2 format first. I may just return an empty string for name
if there is no mf2 name
property.
It doesn’t return the Microformats JSON, it converts it to its jf2 format first.
It might be worth opening an issue on jf2 to see if they want to keep an explicit name
property. Their “author” syntax says it “consists of a name” and more, but that’s not marked as a MUST. For a jf2feed on the other hand name
is a SHOULD.
The real question is, do you see any reasons for postponing this change because of your use of a mf2 parser as a service? I think not?
Nope, wasn't intending to hold things up, just wanted to put that there for the record.
I agree with the current proposal of having p-*
and e-*
and h-*
stop the implied name.
Is there a consensus on this issue? If yes, I can look into adding this to mf2py.
At @kartikprabhu’s request here are some super simplified examples of HTML where unwanted metadata (a footer
of an article
) would be swept up in the name
even though I only want content
or like-of
to show. Before and afters have been included. Output taken from the PHP parser.
e-content
and no p-name
<article class="h-entry">
<div class="e-content">
<p>Wanted content.</p>
</div>
<footer>
<p>Footer to be ignored.</p>
</footer>
</article>
{
"items": [
{
"type": [
"h-entry"
],
"properties": {
"content": [
{
"html": "\r\n <p>Wanted content.</p>\r\n ",
"value": "Wanted content."
}
],
"name": [
"Wanted content. \r\n \r\n Footer to be ignored."
]
}
}
],
"rels": {}
}
{
"items": [
{
"type": [
"h-entry"
],
"properties": {
"content": [
{
"html": "\r\n <p>Wanted content.</p>\r\n ",
"value": "Wanted content."
}
]
}
}
],
"rels": {}
}
p-content
and no p-name
<article class="h-entry">
<div class="p-content">
<p>Wanted content.</p>
</div>
<footer>
<p>Footer to be ignored.</p>
</footer>
</article>
{
"items": [
{
"type": [
"h-entry"
],
"properties": {
"content": [
"Wanted content."
],
"name": [
"Wanted content. \r\n \r\n Footer to be ignored."
]
}
}
],
"rels": {}
}
{
"items": [
{
"type": [
"h-entry"
],
"properties": {
"content": [
"Wanted content."
]
}
}
],
"rels": {}
}
p-name
<article class="h-entry">
<div class="u-like-of h-cite">
<p>I really like <a class="p-name u-url" href="http://microformats.org/">Microformats</a></p>
</div>
<footer>
<p>Footer to be ignored.</p>
</footer>
</article>
{
"items": [
{
"type": [
"h-entry"
],
"properties": {
"like-of": [
{
"type": [
"h-cite"
],
"properties": {
"name": [
"Microformats"
],
"url": [
"http://microformats.org/"
]
},
"value": "http://microformats.org/"
}
],
"name": [
"I really like Microformats \r\n \r\n Footer to be ignored."
]
}
}
],
"rels": {}
}
{
"items": [
{
"type": [
"h-entry"
],
"properties": {
"like-of": [
{
"type": [
"h-cite"
],
"properties": {
"name": [
"Microformats"
],
"url": [
"http://microformats.org/"
]
},
"value": "http://microformats.org/"
}
]
}
}
],
"rels": {}
}
Implemented in mf2py here https://github.com/kartikprabhu/mf2py/tree/implied-name-fix Will push to the main version if this gets consensus and makes it to the spec.
With an implementation for mf2py by @kartikprabhu and an intent to implement in mf2php by @gRegorLove, is that enough to update the spec? @tantek?
it seems the output from mf2py depends on which internal HTML parser is being used ( see: https://github.com/kartikprabhu/mf2py/issues/58#issuecomment-366533803 ), but this should count as an internal bug for mf2py to be fixed.
So I am +1 for this being included in the spec.
implied-name-stopping has been implemented in an experimental version of mf2py.
so +1
This has been added to the spec, at revision 20:27, 4 March 2018. Resolved.
This is a split-off of part of http://microformats.org/wiki/microformats2-parsing-issues#implied_properties_when_an_explicit_class_is_provided that was left unresolved without consensus (since that issue was resolved with just dealing with u-url and nothing more).
Experience has shown there are a number of instances where implied p-name produces something unhelpful. Typically this happens with otherwise large microformats which for some reason omitted the name (e.g. an h-feed, or h-entry that has no author supplied name etc.).
The use-case for which implied p-name was designed was for small microformats, e.g. just a hyperlink with h-* class on it, or maybe just a simple set of nested elements (without siblings). It has some similar use-cases too. But by the point you have elements with multiple explicit properties specified inside, rarely does the current implied p-name rule produce anything useful.
One thing that would help is links to specific examples where excessive or otherwise "useless" p-names are being implied, where no p-name would actually be preferable (from a consuming code point of view).