pfefferle / wordpress-webmention

A Webmention plugin for WordPress
https://wordpress.org/plugins/webmention/
MIT License
117 stars 31 forks source link

Parsing a page with mf2 annotations using the meta parser #492

Open nex3 opened 3 days ago

nex3 commented 3 days ago

I'm testing webmentions from my blog using this post. When I run it through the Webmention tester for https://seaslug.garden, which is using this plugin, it reports

"published": {
  "date": "2024-10-11 07:23:12.614721",
  "timezone_type": 3,
  "timezone": "UTC"
},
"author": {
  "type": "card",
  "name": "https://nex-3.com"
}

It's definitely parsing at least some h-entry metadata correctly: its content field is clearly an HTML-to-ASCII version of my post's e-content. But other metadata is being lost.

When I run the PHP microformat parser directly through its web endpoint, it parses the author and date correctly. This suggests that the issue is somewhere in this plugin. It's worth noting that both p-author and dt-published are nested within u-url and u-uid—this is valid per spec, but it's possible it's causing failures here.

pfefferle commented 3 days ago

Hey @nex3 👋

I just tested your blog post on my sites testing endpoint and got this result:

{
    "published": {
        "date": "2024-09-27 07:38:00.000000",
        "timezone_type": 2,
        "timezone": "Z"
    },
    "updated": {
        "date": "2024-10-10 10:11:00.000000",
        "timezone_type": 2,
        "timezone": "Z"
    },
    "url": "https:\/\/nex-3.com\/blog\/reblogging-posts-with-h-entry\/",
    "author": {
        "type": "card",
        "name": "Natalie",
        "url": "https:\/\/nex-3.com\/"
    },
    "site_name": "nex-3.com",
    "content": "[stripped by me because too much HTML ;)]",
    "summary": "Natalie wrote: Once I add the ability to embed arbitrary blog posts from other blogs on here it's over. I'm gonna be reblogging like a…",
    "response_type": "mention"
}

Do you use the latest version of the plugin?

pfefferle commented 3 days ago

The plugin supports multiple semantics and it seems that you posted the result of the Webmentions Meta-Parser.

nex3 commented 3 days ago

Okay, it makes some sense that none of this is coming through microformats.

The WebMention plugin version is 5.3.3 (IndieWeb is 4.0.5). I don't see any settings for which parser to use, though. I had expected it to prefer h-entry metadata if it was available. Is there a setting that should be switched to make that work?

Edit: I do see that the IndieWeb "Getting Started" page says

Install and activate the Webmentions and Semantic Linkbacks plugins. These will allow you to receive responses such as replies, likes, etc from other IndieWeb sites. You can configure it in the Webmention Settings

However, I don't see a plugin named "Semantic Linkbacks" either in the IndieWeb Extensions page or the main plugin repository.

pfefferle commented 3 days ago

We have to change that, it is only the Webmentions plugin! You do not have to specify anything, the plugin aggregates the info of all parsers. You can experiment with that under Tools > Webmention.

pfefferle commented 3 days ago

Ah, found it! You use an h-entry inside an h-entry and the code searches for the target URL and uses the surrounding h-entry to use it as the post. And this h-entry does not have an author!

You have to give the sub-entry some kind of property, like for example u-in-reply-to.

Try:

<div class="generic-post-wrapper u-in-reply-to h-entry">

and instead of h-entry, the best practice is to use h-cite: https://indieweb.org/reply-context#Markup

pfefferle commented 3 days ago

You generally should not nest h-* objects inside of h-* objects without giving them also a property-attribute. The only exception I know is h-feed > h-entry.

nex3 commented 3 days ago

Thanks for digging into this!

I think I understand what you're saying—the current parser only goes up one "level" of h-entry from the link in question.

I think my current markup semantically represents my intentions as best as possible within the spec—my post, by me, contains an embedded post by another author. It's not precisely a "repost" because it adds substantial additional content, and it's not precisely a "reply" because the text isn't in conversation with the original. I don't directly embed this as a u-repost-of because it's a full representation of that post itself and not just a citation. Even if I wanted to bend the semantics of what a "citation" is, h-cite doesn't allow rich HTML e-content, only a p-content string.

It seems to me there are three potential paths forward, and I'm curious which you think is best (or if I'm missing something):

  1. Accept my current markup as an accurate way to represent the semantics of "a post that contains multiple posts" and change the plugin logic to use the outermost h-entry as the canonical source for authorship and other metadata, rather than the innermost.

  2. Change the markup I generate to fudge the h-entry spec a bit and generate class="u-repost-of h-card" for nested embeds. Possibly file an issue to advocate for this becoming the recommended representation of what I'm expressing here.

  3. Follow the h-entry spec and generate class="u-repost-of h-card" for nested embeds, but then fudge the h-card spec instead to use e-content rather than p-content for rich HTML card contents. Possibly file an issue to advocate for this becoming the recommended representation of what I'm expressing here.

I guess there's also

  1. Become compatible with this plugin while following the spec to the letter by not using microformat metadata at all for nested embeds, or by making them h-cards with no content at all.

...but losing semantic markup for the embedded posts makes this seem like clearly worse than the first three options.

pfefferle commented 3 days ago

I am sure if it is by the spec, to be honest. I have not seen h-entry inside of an h-entry without putting it in a property. But we could discuss that here: https://chat.indieweb.org/microformats/2024-10-11#bottom

I will see how much work it would be to support that case, but I fear that we could break something else 🤔

You could also use other properties btw. it does not have to be in-reply-to. We could discuss that also in the Microformats chat: https://chat.indieweb.org/microformats/2024-10-11#bottom

nex3 commented 3 days ago

I'll hop in chat to talk this over once I'm able.

Of option 2 or 3, which do you think is better for the short term?

pfefferle commented 3 days ago

The thing is: Even we might fix it for WordPress, it might crash on other endpoints, so I would decide by the most compatible solution!?!