microformats / microformats2-parsing

For collecting and handling issues with the microformats2 parsing specification: http://microformats.org/wiki/microformats2-parsing
14 stars 6 forks source link

Handling of templates #60

Open JKingweb opened 1 year ago

JKingweb commented 1 year ago

Currently the specification states only that template content should be ignored, as per HTML parsing. Implementations, however, go further and act as if templates are not there at all. The only exception is Go, which fails to observe HTML parsing requirements and treats templates like any other element. In at least one case (PHP) the templates are in fact removed from the document in a pre-processing step.

There are several cases where a plain reading of the spec (ignore template content) differs from the four major implementations:

  1. When a template is a microformat:
    <!-- I am an empty microformat, unless you ignore templates completely -->
    <template class="h-mf"></template>
  2. When a template is is a value-class:
    <!-- My name is an empty string, unless you ignore templates completely -->
    <div class="h-mf">
    <div class="p-name">
    <template class="value"></template> You cannot stifle me, apparently.
    </div>
    </div>
  3. When a template is is a value-title-class:
    <!-- My name is an empty string if you ignore templates completely -->
    <div class="h-mf">
    <div class="p-name">
    <template class="value-title" title="I am a template which carries information!"></template>
    </div>
    </div>
  4. When a template factors into only-child evaluation:
    <!-- My name is an empty string, unless you ignore templates completely -->
    <div class="h-mf">
    <template></template>
    <img alt="This is a false name. Maybe.">
    </div>
    <!-- My photo is an empty string, unless you ignore templates completely -->
    <div class="h-mf">
    <template></template>
    <div>
    <img src="http://example.com/template.png">
    </div>
    </div>

Should the specification text be updated, or are implementations buggy?

Case 3 seems especially problematic as it involves potential loss of useful information. Case 2 is exercised in the test suite, but the other three cases are not.

sknebel commented 1 year ago

I'd say the intent of the spec change was for "completely ignore" as the parsers do, and that the added language does not have the same effect was not considered. As such my vote is towards "adjust the spec language".

Are there arguments why the other behavior should be considered superior? I personally can't see myself writing HTML that relies on mf2 on the <template> tag, but maybe I'm missing something.

JKingweb commented 1 year ago

There's a fifth case of template interaction I hadn't considered previously: the contents of the html property of e-properties. It would, presumably, be deeply odd for templates to be excluded from that output since mf2 simply says to use the standard HTML serializer.

sknebel commented 1 year ago

One one hand, I don't really see use cases that would want to preserve a <template> tag in a an e-* property. On the other hand, most uses will have to post-process with their own sanitizers anyway, so it's not like removing them is adding much value. It maybe should be clarified for completeness sake, and keeping them in would be fine to be that behavior, but I doubt in practice a parser removing them would break anything.