microformats / microformats2-parsing

For collecting and handling issues with the microformats2 parsing specification: http://microformats.org/wiki/microformats2-parsing
14 stars 6 forks source link

Define removal of SCRIPT and STYLE elements everywhere textContent is requested. #17

Closed Zegnat closed 6 years ago

Zegnat commented 6 years ago

In practice parsers are already doing this everywhere, but that is currently against the specification. I say this is a mistake in the spec and not in parsers.

When the textContent value is used in mf2 we specify the removal of <script> and <style> elements within p-x, u-x, and dt-x parsing. But do not for e-x or implied name parsing.

According to spec:

<div class="x-h">Hello <script>beautiful </script>person</div>

Results in an implied name of Hello beautiful person.

gRegorLove commented 6 years ago

Previous issue and resolution: http://microformats.org/wiki/microformats2-parsing-issues#exclude_style_elements_before_parsing

Appears might have just missed some instances in the spec update, but need to double-check and confirm. See this revision.

sknebel commented 6 years ago

given what @gRegorLove found I'd say the missing pieces are: a) specify the same for the value-version of a e-property (which likely was missed since the html was explicitly excluded in the discussion)

b) in the section about implied name properties, make it clear that textContent should be postprocessed the same way as for p- properties.

gRegorLove commented 6 years ago

Proposed updates, which I believe are in line with the resolution:

parsing a p- property No content change, just splitting out whitespace trimming into a separate bullet point:

Original:

  • else return the textContent of the element after:
    • dropping any nested Githubissues.
    • Githubissues is a development platform for aggregating issues.