Open tantek opened 7 years ago
We can also leave this open longer, and just move forward with #4 and/or #8 until we have more evidence or consensus one way or the other.
My answer to the question in the title would be Yes.
I feel like dt-*
handling should describe how a string gets turned into a datetime stamp. No matter where the string is coming from (textContent, attribute, VCP, …). I also think this would give parsers an easier job.
As I wrote in #8 and on IRC (emphasis added that is implicit to this issue):
is there some way we can generalise a vcp-to-string algo for dt- and then generalise a string-to-valid-timestamp algo that works on the string value of the dt-, then it no longer matters if that string value was obtained through regular parsing or through vcp.
Here is a real-world example we ran into today on http://indieweb.org/events.
<span class="h-event vevent">
<span class="dt-start dtstart">
<span class="value" title="August 1, 2018">2018-08-01</span>
<span class="value" title="20:30">20:30<span style="display: none;">-5:00</span></span>
</span>–
<span class="dt-end dtend">22:00<span style="display: none;">-5:00</span></span> (-5:00 <abbr>UTC</abbr>):
<span class="p-content">An informal online get together for people new to blogging, building websites, or using IndieWeb plugins on WordPress.</span>
</span>
php-mf2 parse:
{
"items": [
{
"type": [
"h-event"
],
"properties": {
"content": [
"An informal online get together for people new to blogging, building websites, or using IndieWeb plugins on WordPress."
],
"start": [
"2018-08-01 20:30-0500"
],
"end": [
"22:00-5:00-0500"
]
}
}
],
"rels": {},
"rel-urls": {},
"debug": {
"package": "https://packagist.org/packages/mf2/mf2",
"source": "https://github.com/indieweb/php-mf2",
"version": "v0.4.5",
"note": [
"This output was generated from the php-mf2 library available at https://github.com/indieweb/php-mf2",
"Please file any issues with the parser at https://github.com/indieweb/php-mf2/issues",
"Using the Masterminds HTML5 parser"
]
}
}
mf2py parse:
{
"rels": {},
"items": [
{
"type": [
"h-event"
],
"properties": {
"content": [
"An informal online get together for people new to blogging, building websites, or using IndieWeb plugins on WordPress."
],
"start": [
"2018-08-01"
],
"end": [
"22:00-5:00"
]
}
}
],
"rel-urls": {},
"debug": {
"source": "https://github.com/microformats/mf2py",
"version": "1.1.1",
"markup parser": "html5lib",
"description": "mf2py - microformats2 parser for python"
}
}
I guess this makes sense. VCP and the HTML rules for the datetime
attribute of the <time>
element are probably good starting points of syntax to accept, with the latter maybe being the output format too?
After having https://github.com/microformats/tests/issues/29 confirmed and resolved, the lack of this being in the standard is the only thing preventing the Rust parser from being fully compliant, thus enabling this: https://github.com/microformats/microformats2-parsing/issues/12#issuecomment-331626987
(Originally published at: https://jacky.wtf/2022/6/yy8Z)
I found some more edge cases that this spec update should cover:
- if the value has a specific ISO8601 date, time, and timezone, use those and stop looking for "value" elements.
<div class="h-event">
<span class="dt-start">
<span class="value">2022-07-05T17:30-08:00</span>
</span>
</div>
This "value" is used as-is, no normalization to remove "T" or the colon in timezone offset:
{
"items": [
{
"type": [
"h-event"
],
"properties": {
"start": [
"2022-07-05T17:30-08:00"
],
"name": [
"2022-07-05T17:30-08:00"
]
}
}
]
}
Similarly for:
- if the value has both a specific ISO8601 date and time, use those
<div class="h-event">
<span class="dt-start">
<span class="value">2022-07-05T17:30</span>
</span>
</div>
{
"items": [
{
"type": [
"h-event"
],
"properties": {
"start": [
"2022-07-05T17:30"
],
"name": [
"2022-07-05T17:30"
]
}
}
]
}
I mentioned before how this is a upstream blocker to get the Rust library fully compatible. That's changed but normalization would simplify the act of parsing (and testing) date values, thus me throwing my vote in favor of it and curious to hear if anyone else is in favor of that as well.
(Originally published at: https://jacky.wtf/2023/10/evyZ)
I'm also in favour of normalizating date values everywhere, be it VCP or not. Parsers already have to perform normalization sometimes, so it adds no appreciable complexity to parsers, while simplifying things for consumers of the output of parsers.
My own parser already does this by default, for what it's worth.
While we're at it it might be worthwhile to drop :
from time zones and transform Z
to +0000
so that downstream consumers only have to deal with five formats in the JSON:
It's a pretty straightforward application of Postel's law, with no information lost, and no new formats added.
Currently in http://microformats.org/wiki/microformats2-parsing#parsing_a_dt-_property special date and time parsing is only done as part of step one for VCP handling.
The proposal is to move (and thus extract from VCP and inline into mf2 parsing) that "date and time parsing rules" mentioned in step 1 to after all the value retrieval is done, before returning a value.
This would be a larger fix that should incorporate also accepting the proposals in issue #4 and #8 .
I don't have a specific real world example for this particular proposal, thus the issue title is a question. All feedback welcome, and especially real world examples that would be helped by this beyond the smaller fixes noted in #4 and #8.
Feedback explicitly requested from: @sknebel @gRegorLove @Zegnat. Thanks!