microformats / microformats2-parsing

For collecting and handling issues with the microformats2 parsing specification: http://microformats.org/wiki/microformats2-parsing
14 stars 6 forks source link

vcp: Clarify VCP datetime parsing #26

Open kartikprabhu opened 6 years ago

kartikprabhu commented 6 years ago

The current parsing steps http://microformats.org/wiki/value-class-pattern#Basic_Parsing defer to special datetime parsing in step 3.2 which is misleading. All vcp parsing with datetime properties should be parsed according to http://microformats.org/wiki/value-class-pattern#Date_and_time_parsing

cite: https://freenode.logbot.info/microformats/20180314#c1463421

Suggestion: Move Step 3.2 earlier, maybe part of Step 1 or step 1.1.

cc: @tantek @Zegnat

Zegnat commented 6 years ago

I will put my reasoning here as to why I thought date and time parsing should not be applied to single vcp elements. To be clear, by this I mean there is only 1 element with the class value as a direct descendent of the dt-* property element. E.g. I expected two equal values for published from this HTML:

<div class="h-entry">
  <abbr class="dt-published" title="The first Wednesday after the tenth Friday of 2018">today</abbr>
  <span class="dt-published">
    <abbr class="value" title="The first Wednesday after the tenth Friday of 2018">today</abbr>
  </span>
</div>

I expected this because:

  1. parsing a dt- property in the mf2 spec says “parse the element for the value-class-pattern, including the date and time parsing rules.” I interpret the word “including” here to mean that Basic Parsing should still apply. (I was inferring, maybe wrongly, not “assuming” from the way the spec was written.)
  2. it made no sense to me that moving the abbr to be a descendent (for whatever reason or mark-up constriction) changes the entire value of the property. Especially since this would only happen for dt-* then.

From reading the chat log, it seems @tantek is suggesting “Date and time values” being on the same level as “Basic Parsing” should take precedence when it comes to datetimes. I have no objection to that, but I do feel that should be called out less ambiguously somewhere. Preferably in the mf2 parsing spec.

Hopefully this cleared up where I was coming from!


Currently there are some suggestions to this, but I do not find them to be very concrete. For instance, the Date and time values section starts off with:

Some microformats properties expect an ISO8601 datetime value, e.g. hCalendar dtstart and dtend, hAtom published and updated, and all microformats2 dt-* properties.

I find this almost misleading even. dt-* does not “expect an ISO8601 datetime value”. Or at least, I couldn’t find this expressed anywhere on the mf2 pages. It would also go against the use of the time element with a datetime attribute, which specifically expects dates as defined by the HTML spec and not ISO8601. (E.g. 4h 18m 3s is a valid duration in HTML, but not per ISO8601.)

I would not mind pinning dt-* to ISO8601, but I would then expect this to be made clear on mf2 pages and apply to non-vcp values as well, and not to be a quick one-liner remark halfway into the vcp page.

From the mf2 prefixes page:

special parsing required: value-class-pattern and separate date time value parsing for readability

This does not – to me – clearly convey that only the date & time section of the vcp page should be used. Then again, I have no clue what “and separate date time value parsing for readability” conclusively refers to either.

And of course there was the previously mentioned sentence from the parsing spec and its use of the word “including” as mentioned previously:

parse the element for the value-class-pattern, including the date and time parsing rules.