Open kartikprabhu opened 6 years ago
In-the-wild example of 5), properties nested below a <noscript>
tag: @snarfed has an author h-card on his entries like
<span class="author p-author h-card vcard">
<img alt src="https://snarfed.org/…/1x1.trans.gif" class="" data-lazy-src="https://secure.gravatar.com/…">
<noscript>
<img alt='' src='https://secure.gravatar.com/…' srcset='https://secure.gravatar.com… 2x' class='… u-photo'/>
</noscript>
<a class="u-url url fn n p-name" href="https://snarfed.org/" title="Ryan Barrett" rel="author">
Ryan Barrett</a></span>
Due to the image-lazyloading code, the photo URL isn't in the non-noscript markup as a src-property, and thus u-url
is only marked up inside the <noscript>
. I don't see a reason why parsers should not support this case and look inside the tag.
from https://glennjones.net/tools/microformats/
{
"items": [{
"type": ["h-entry"],
"properties": {
"name": ["Some name\nnoscript in name"],
"content": [{
"value": "This is some content\nnoscript in content",
"html": "\n<span>This is some content</span>\n<noscript>noscript in content</noscript>\n"
}],
"summary": ["This is summary inside noscript"],
"photo": ["http://example.com"]
}
}],
"rels": {},
"rel-urls": {}
}
thank you all for the in depth sleuthing! i noticed and dealt with this recently myself in https://github.com/snarfed/bridgy/issues/798 .
worth noting: noscript tag handling evidently depends on the underlying HTML parser, not mf2py itself. lxml returns noscript contents, html5lib ignores them.
@snarfed I don't think html5lib ignores them. The output for mf2py above was using html5lib and it does parser the contents of <noscript>
. Maybe you meant the other way around i.e. lxml ignores the <noscript>
?
@kartikprabhu html5lib definitely ignored noscript in my testing a week ago. latest released version of mf2py afaik, Python 2.7. details in https://github.com/snarfed/bridgy/issues/798#issuecomment-370508015 . maybe differences in our environments, or because there was an earlier img without u-photo, so it used an implied rule first? who knows.
@snarfed aah! might be since I am using html5lib v 1.0.1 with the newly updated mf2py parser from my repo
This was indeed changed in html5lib 0.99999999, which mf2py just jumped past, so this behaving differently now is expected.
- Should
<noscript>
tags be included while explicitly parsingp-*
usingtextContent
?
These questions are interesting because noscript
, like template
is a little odd. If you assume scripting to be enabled the textContent
of the noscript
in <noscript><span>hi!</span></noscript>
is not hi!
but <span>hi!</span>
which is probably not what the HTML author is expecting from the mf2 parser.
If we assume all mf2 parsers are operating where scripting is disabled, the noscript
element basically acts the same in the DOM tree as an a
element and I see no reason to ignore it. This makes sense to me, unless we can show this will lead to a lot of duplicated content in the wild.
- Should
<noscript>
tags be included ine-*[html]
?
Yes. There is nothing in the HTML fragment serialization algorithm linked to by the spec that excludes them.
- Should
<noscript>
tags be included ine-*[value]
?
I believe this should follow the same plaintext parsing as p-*
. So this should match whatever is decided for question 4.
- Should the content of
<noscript>
tags be included while parsing explicitp-*
on the<noscript>
tag?
I would say yes, but the contents of this textContent
depends on what is decided in question 1. Again, I would lean towards the scripting disabled case which handles the noscript
element (almost) no different from other elements.
- Should properties nested inside a
If we can agree that mf2 parsers operate where scripting is disabled (again, see question 1) and we are treating noscript
like any other transparent element, then yes.
At IWS 2018 (https://indieweb.org/2018/microformats#parsing_.2324) It was accepted to treat <noscript>
as a <div>
.
By this proposal the answer to all five questions posed in the beginning (https://github.com/microformats/microformats2-parsing/issues/24#issue-304172814) should be "yes".
Maybe this should be made explicit in the spec. Here is a proposal to change the section http://microformats.org/wiki/microformats2-parsing#note_HTML_parsing_rules
Add the rule
<noscript>
elements are treated as if they are <div>
elements
cc: @tantek @aaronpk @kevinmarks @gregorlove
The parsing spec is currently silent on how to handle
<noscript>
tags. Parsers seem to handle this is different waysParticular issues
<noscript>
tags be included while explicitly parsingp-*
usingtextContent
?<noscript>
tags be included ine-*[html]
?<noscript>
tags be included ine-*[value]
?<noscript>
tags be included while parsing explicitp-*
on the<noscript>
tag?<noscript>
tag be parsed?HTML
Current Parser outputs
Ruby, Go testing from http://microformats.io
mf2py 1.1.0 (using default html5lib parser), Ruby parser 4.0.6,
phpmf2
Go