indieweb / authorship

Authorship is an algorithm for discovering and determining the author of a post.
6 stars 1 forks source link

do we need an h-feed authorship algorithm? #4

Open snarfed opened 4 years ago

snarfed commented 4 years ago

how should we determine an h-feed's author? we discussed on #indieweb-dev and on #microformats, and the short answer is evidently that we don't know yet. we don't yet have an "authoritative" way, at least for h-feeds without an explicit p-author property. representative h-card and authorship algorithm are both related, but neither is the exact answer. the authorship wiki page has an Authorship for streams of posts section that's close, but thin.

so, do we need a new h-feed authorship algorithm? or should we extend one of those two algorithms? or something else?

@tantek said here that we still need to do some research and come up with an algorithm. we don't necessarily have the "right" one just yet. so, i've filed this as a place to track research. feel free to close this and move it to the wiki instead if you prefer!

snarfed, h-feed authorship is an interesting problem and worth researching & brainstorming properly rather than seeing if h-entry approaches “just work” because that may be overdoing it Better to collect examples (links, analysis) of h-feed elements that you’re trying to parse and analyze them to figure out a minimum algorithm based on examples The “XML approach” would be to assume / require authors/publishers always use an author property and then “just” look for that. While a good starting point, it’s obviously a bad approach to optimize for developer convenience rather than researching reasonable real world examples and making sure to handle them It’s also a bad approach to “just try” some other similar algorithm to see if it “just works” as you’re likely making all sorts of bad assumptions by doing so So I disagree with both “just use representative h-card” and “just use h-entry authorship but for h-feed” There’s no shortcut here. If you want a good algorithm it has to start with documenting & analyzing real world publishing examples

(this was motivated by @alexmingoia's recent granary#195 issue that granary doesn't determine an h-feed's author very well right now. it naively uses authorship algorithm, which is designed for h-entry, not h-feed.)

cc @kevinmarks @aaronpk. originally filed as microformats/microformats2-parsing#49.

alexmingoia commented 4 years ago

Here is the algorithm I am using to parse feed author in the wild:

  1. If h-feed with p-author, author is p-author.
  2. If h-feed with u-url, and that URL has h-card matching u-url, author is that h-card.
  3. If h-feed with u-url, and that URL has no h-card matching u-url, author URL is u-url and name is page <title>.
  4. If h-feed with no u-url or p-author, author URL is page URL and name is page <title>.
  5. If no h-feed then no feed author.