microformats / php-mf2

php-mf2 is a pure, generic microformats-2 parser for PHP. It makes HTML as easy to consume as JSON.
Creative Commons Zero v1.0 Universal
194 stars 38 forks source link

Add optional ID for h-* elements #206

Closed dshanske closed 4 years ago

dshanske commented 6 years ago

This came out of trying to uniquely identify elements on tantek.com, where there are multiple h-feeds in a page. I'm not sure how to uniquely refer to any of them. The suggestion is that an id property can be generated by taking the HTML id of the element that the h-* is on. It can be uniquely addressed by adding that as a fragment to the URL. For example, in the case of tantek.com, http://tantek.com#recent_articles should be his recent articles h-feed.

mblaney commented 6 years ago

@dshanske why not just use p-name when u-uid is not available? In the example given, using p-name would give you "Recent Articles" which seems like a pretty good identifier for the h-feed.

dshanske commented 6 years ago

Mostly because a URL like the one in the example, makes it easy to share feeds with others.

dshanske commented 6 years ago

https://github.com/microformats/microformats2-parsing/issues/44

mblaney commented 6 years ago

ok if sharing feed urls is the goal here, then sure #id is the most recognizable format. The downside is that all parsers and content need to change to support it. Even with those changes, this is still a new convention that needs to be supported by readers looking for multiple h-feeds on a page.

Here's another new convention that would also work, and not require any changes: http://tantek.com?name=Recent+Articles for the p-name case, or similarly ?uid= if you've discovered u-uid on the h-feed. If there's other query parameters on the url, the h-feed identifier just goes on the end.

dshanske commented 6 years ago

I am fine with continuing the discussion, we just in my opinion, need something we can collectively adopt

aaronpk commented 6 years ago

Are there any examples of this other than Tantek's website?

mblaney commented 6 years ago

I haven't see any others @aaronpk, but I don't mind supporting it if the solution is simple enough. Another idea is to borrow from https://indieweb.org/fragmention and basically stick to your original url @dshanske, but use the fragment to match the h-feed. Without the mf2 type in the query you would just need to check it against both p-name and u-uid which I think would be ok.

gRegorLove commented 6 years ago

I'm leaning in favor of @sknebel's proposal in https://github.com/microformats/microformats2-parsing/issues/44, adding a new id attribute to the parsed microformat object. It gives flexibility to consumers how they want to use the id for a feed element.

@mblaney I think those query params would still require updates for readers to understand the special meaning of them. They also are more likely to conflict with existing, non-microformat-related query params.

mblaney commented 6 years ago

@gRegorLove yes both methods require readers understand a new convention for multiple h-feeds in the same file. I agree that fragments are better than parameters, but I don't see a good reason for including ids in parser output. This is probably a separate issue but if you only want the mf2 identified by the fragment you can use parseFromId.

mblaney commented 6 years ago

actually, going back to the start of this thread if you pass the original url to parseFomId you get the h-feed you wanted. So the solution is to check if there's a fragment on the url first and try parsing from that point only. If you don't get a result you can try looking for p-name or u-uid on each h-feed in the page that matches the given fragment.

Zegnat commented 6 years ago

So the solution is to check if there's a fragment on the url first and try parsing from that point only.

It is very easy to lose context when doing things that way because you lose the rest of the document’s data. Case: use XRay to parse this fragment-URL identified post. It will fail to apply step 4 of the authorship algorithm because it didn’t parse the entire document and therefore misses out on the parent object’s author.

A double parse where you both parse the entire document and only the fragment identified section is a little fragile. The smaller object still needs something that uniquely identifies it if you want to find its location in the bigger tree. So either an ID or uid property.

Making the HTML ids a unique identifier for mf2 objects, in addition to already being a unique identifier for the HTML element, solves that problem. A fragment URL will point to the same data within the HTML and mf2 documents. If we treat the fragment URL for HTML as portable, that same property is automatically inherited by the mf2 document too.

mblaney commented 6 years ago

yes you're right @Zegnat, I was implementing this today and hit that exact problem trying to use parseFromId :-)

Back on multiple h-feeds on a page, if you're going to do discovery and you find an h-feed without an id, you're still going to need to differentiate the feeds somehow. I'm just suggesting that the fragment identifier can be matched against mf2 properties as well.

mblaney commented 6 years ago

speaking of double parsing, there's plenty of times when I'm calling parse and just wanting items. Does anyone think it's worth having a few more flags such as enableAlternates to avoid the extra calls done here? (If yes then probably should continue this discussion in a new issue.)

gRegorLove commented 4 years ago

Implemented in https://github.com/microformats/php-mf2/pull/207