microformats / microformats2-parsing

For collecting and handling issues with the microformats2 parsing specification: http://microformats.org/wiki/microformats2-parsing
14 stars 6 forks source link

Ignore Tailwind's h- prefixes #59

Open tommorris opened 1 year ago

tommorris commented 1 year ago

Copied from mf2py issue 170.


Tailwind is a CSS framework that uses h- prefixed values for styling, specifically height.

Therefore, the following example won't parse well in any correct implementation of microformats2.

<html class="h-full">
...
<span class="h-card p-name">Johnny Appleseed</div>
...
</html>

The options for dealing with this seem to be:

  1. make Tailwind go away permanently (honestly, I'm not a fan, but each to their own)
  2. tell HTML authors to not use Tailwind with height properties if they're also using microformats2
  3. ignore Tailwind's height properties (by default, ideally there should always be a way to get strict parsing) - the parsing spec apparently already excludes h-[0-9]* but we could also the other h- properties used by Tailwind

Given Tailwind doesn't seem to be going away, and given priority of constituencies ("consider users over authors over implementors over specifiers over theoretical purity"), the third option may be the least bad, even if it does make parsing marginally more complicated.

jalcine commented 1 year ago

If we go with the third option, I'd be slightly pressing the suggestion to parsers to allow for an option to restrict any of the h- values to be ones that can represent known values, namely those with defined specifications

This could be an evolving list, which is already done with Micropub extensions and new specifications of objects themselves. This also could break backcompat with those parsers that fallback to being a h-entry or the like, but I see that as very unlikely.

(Originally published at: https://jacky.wtf/2022/11/9QFM)

barnabywalters commented 1 year ago

As the list of tailwind-specific false-positive root classnames is short and knowable, it’s possible to take parsed mf2 output and restructure it as if the false positives had not been there. This is then available as a customisable post-processing step which consumers can choose to do, or not. There’s a draft implementation here https://github.com/barnabywalters/php-mf-cleaner/blob/main/src/functions.php#L629 (relevant test here for reference)

capjamesg commented 1 year ago

@barnabywalters Thank you for sharing your draft implementation!

I would be in favour of collating a whitelist of all accepted microformats values, rather than ignoring tailwind-specific values. If tailwind adds new attributes, we will need to release new versions of all parsers, which is an external decision made by the tailwind team. I don't think we should be reactive to external tools that use the h-* pattern.

gRegorLove commented 1 year ago

The generic parsing of vocabularies is a key feature of microformats2 and improvement from mf1, so I don't think an allowlist can be part of the core parsers.

capjamesg commented 1 year ago

@gRegorLove Thanks for sharing! Do you think we should take any action on this at the parser spec level?

gRegorLove commented 1 year ago

I don't see an immediately obvious way to. Most of the Tailwind height selectors have numbers (.h-0) so will be ignored during parsing:

The "*" for root (and property) class names consists of an optional vendor prefix (series of 1+ number or lowercase a-z characters i.e. [0-9a-z]+, followed by '-'), then one or more '-' separated lowercase a-z words.

It's those handful like .h-full .h-auto .h-min .h-max that will be parsed and I don't see an easy way around that without going down the route of allowlists, unfortunately. It does feel like more of mf2 utility add-on like Barnaby's implementation.

sknebel commented 1 year ago

It'd be interesting to experiment with user-provided filtering inside the parser (in many ways easier than trying to save it in post-processing), but I'm wary of adding this to the spec without extensive experimentation.