metaformats: should we distinguish the parsed output item more explicitly?

microformats / metaformats

Issue tracking for the metaformats specification, an extension to the microformats2 parsing specification, for parsing invisible data published in HTML meta tags

https://microformats.org/wiki/metaformats

Creative Commons Zero v1.0 Universal

2 stars 3 forks source link

metaformats: should we distinguish the parsed output item more explicitly? #2

Open snarfed opened 11 months ago

snarfed commented 11 months ago

Right now, if https://microformats.org/wiki/metaformats finds eligible metaformats, it generates an h-entry and appends it to the returned items. There's no way to distinguish this item from real mf2 items, though, which is unfortunate. As an implementor, I could use one! Especially for interpreting home page metaformats as an h-card, eg microformats/metaformats#3, but also for non-homepage pages. Should we include a new property? New type? (I assume not.) Something else? cc @tantek

sknebel commented 11 months ago

given that they as far as I see don't really participate in the nesting of objects (i.e. a metaformats-parsed object is not going to be a child or property-value of an mf2-parsed object, nor vice-versa) they could be sorted in a separate list, e.g. metaformats-items. Alternatively, they could have an extra flag on the same level as type

aciccarello commented 11 months ago

I wondered about this too in microformats/microformats-parser#229.

Should there be a property identifying the mf as being parsed from metaformats in case someone wants to cleanup messy meta tag content

I'd prefer to not put them in a separate list so a consumer of the parsed output doesn't need to do anything extra. So far I haven't personally needed to know if if an output if from metaformats, but I could see a property identifying it being useful.

angelogladding commented 11 months ago

I think adding a new property meta-item keeps things clean and explicit. In Python:

if parsed["meta-item"]:

vs. eg.

if parsed["items"] and parsed["items"][-1].get("source") == "metaformats":

I believe mf2py can toggle metaformats parsing on by default immediately if we can keep items as is and use meta-item experimentally -- see https://github.com/microformats/mf2py/pull/213#issuecomment-1837690945

snarfed commented 11 months ago

As @aciccarello mentioned, the problem is that a separate list forces all consumers to have to be explicit. One of the benefits of the current metaformats spec is that it lets current mf2 consuming code (choose to) benefit from metaformats automatically, without any changes. New top-level field preserves that, separate list doesn't.

angelogladding commented 11 months ago

I do like automatic fallback for entries. Now I better see what you guys are talking about.

mf2util will need to be updated to look for the new top-level field and ignore it when interpreting a feed but everything else in that library should just work (again by simply operating on the first item).

>>> mf2json = mf2py.parse(url="https://zeldman.com", metaformats=True)
>>> homepage_feed = mf2util.interpret_feed(mf2json, "https://zeldman.com")
>>> homepage_feed["entries"][-1]["name"]
'Zeldman on Web and Interaction Design'

The fix will look something like this which is perfectly fine:

if feed["entries"][-1].get("source") == "metaformats":
    feed["entries"].pop()

And you'll never actually need to look up the meta item so I was optimizing for a non-existent case with:

if parsed["meta-item"]:

So keeping it in items and adding a top-level field does make good sense.