Closed snarfed closed 6 years ago
Agreed. I believe it was a mistake on my part to mention it in #17 (presumably I just looked at the name parsing), and nobody caught that this expanded the scope of the issue.
To summarise the previous change: we normalised text parsing to use the exact same wording for p-*
, e-*
’s value
, and implied name. Originally to make sure <script>
and <style>
tags would get correctly dropped everywhere, but in the end also included mf2’s <img>
replacement step. Parsers interpret @snarfed’s example HTML the same when it does and does not include a p-name
class wrapping the whole thing:
<p class="h-card"><span class="p-name">
My Name
<img src="http://xyz" />
</span></p>
I previously thought this was fine and would limit the amount of “surprises” for people when expanding from implied to explicit parsing, etc. But seeing how there now seem to be multiple objects in the wild that introduce breakage specifically on implied names, it will probably correct to change this.
As @snarfed said, it doesn’t look like anyone is against this. I would propose one of the following spec changes:
<img>
elements completely from the textContent
sub-step on implied names.Change the line to still allow replacing it with the matching alt
attribute, but never its URL. Leaving only:
replacing any nested
<img>
elements with their alt attribute, if present;
So I wasn’t convinced about dropping alt
completely. Also to keep support for something like:
<a class="h-card" href="/">
<img src="photo.jpg" alt="Zegnat">
</a>
But it turns out that the alt
will always be used if the <img>
is the only content of an element. This leaves only cases where part of the name is expressed by an image while another part is expressed by text. And I am not seeing the use-case for that.
Thanks @sknebel for walking my thoughts through this. 10 minutes of IRC chat log, starting here. I am now also fully on board for dropping the line about replacing <img>
completely.
there now seem to be multiple objects in the wild that introduce breakage specifically on implied names
to clarify, i'm against including src
(and probably alt
) in both explicit and implicit p-name
. i don't want and couldn't use an mf2 parser that includes the URL in the parsed name
for your example HTML with explicit p-name
either.
if that means i should change the issue title, or otherwise changes the request, let me know. happy to discuss more!
to clarify, i'm against including
src
(and probablyalt
) in both explicit and implicitp-name
.
For some context, reversing <img>
replacement in implied names changes a spec addition made earlier this year and only removes a part of it that could be labelled accidental. I have no issues with doing that right now.
Dropping <img>
from all explicit p-*
parsing reverses a May 2013 spec revision. I am really surprised that this behaviour wasn’t already fully implemented in all parsers.
For explicit mark-up, I agree with @sknebel on IRC, spin that to another separate discussion. To quote:
the bigger discussion, check how that relates to aaronpks issue and/or make a new one
because that changes a lot
Apparently this was discussed at the IndieWebSummit. Could someone please look up the conclusion and document it here?
(@Zegnat has volunteered to listen to the session recording and do this)
We agreed to @Zegnat's proposal from May 31: https://github.com/microformats/microformats2-parsing/issues/35#issuecomment-393615508
2+ implementor votes and mf2py implements this now, so I think we're good to update the spec.
Martijn clearly proposed two solutions, and from the change in mf2py the second one, which only partially reverts the change @snarfed has an issue with, got picked. Why that and not remove the entire line? Did people find a reason why keeping the alt
case was important?
I'm happy to accept it if it is consensus, but I'd like to understand why this one was picked, and in the youtube video it seems only to be said "Accept Martijn's proposal", not which one.
Oops, I should have linked the video earlier. I've transcribed some of it below for easier reference. We discussed and agreed on the second proposal.
Topic starts https://youtu.be/0OUy7jjUN5k?t=2h1m20s, then at 2:02:23:
Tantek: so we're just going to use the alt attribute then and not the source Ryan: for implied Tantek: only for implied? Ryan: Martijn's proposal was only for implied [Brief tangential discussion about implied p-name] Tantek: I think that proposed resolution is fine. It's a conservative change, so if we want to open a separate issue for getting rid of
src
in general, but let's solve the particular implied problem immediately
So I believe the agreed update is:
Under implied p-name parsing rules, replace:
replacing any nested
<img>
elements with theiralt
attribute, if present; otherwise theirsrc
attribute, if present, adding a space at the beginning and end, resolving the URL if it’s relative;
With:
replacing any nested
<img>
elements with theiralt
attribute, if present;
Updated in spec with this revision: http://microformats.org/wiki/index.php?title=microformats2-parsing&oldid=66871
This issue can be closed now.
right now,
img
src
andalt
values often get included in impliedp-name
s (spec; added in #17).names are usually expected to be relatively short and human readable, but image URLs are often long, not very human-readable, and add little to no value. aaronpk/XRay#53 is related and has details. also, as a downstream user myself, this recently surprised me when it was added to mf2py 1.1.0, and is undesirable enough that i can't realistically upgrade bridgy or granary to it with this behavior.
alt text is usually human readable, but often fairly disconnected and out of context from the rest of the implied name, which results in similarly poor quality text. #16 is related and has details.
a brief straw poll of #microformats this morning found many people who agree, and no one who wanted to keep
src
URLs oralt
values in implied p-name. should we consider dropping that part of the spec?as an example, this html:
currently gets parsed to: