microformats / microformats2-parsing

For collecting and handling issues with the microformats2 parsing specification: http://microformats.org/wiki/microformats2-parsing
14 stars 6 forks source link

imply dates also outside vcp? #8

Open sknebel opened 7 years ago

sknebel commented 7 years ago

Currently, if a dt-property is only defined as a time (or time with timezone) and being parsed from a value-class-pattern, a parser is supposed to add the date from other dt-properties of the object. #4 proposes extending this to timezone information as well.

The question is if this should be expanded to other cases with incomplete information.

If it were, in an h-event like

<div class="h-event">
 <span class="e-summary">HomebrewWebsiteClub Berlin</span> will be next on 
 <span class="dt-start">
  <span class="value">2017-05-31</span>, from
  <span class="value">19:00</span></span>
to  <span class="dt-end"><span class="value">21:00</span></span>.</div>

the inner span in the dt-end property could be removed. Parsers would have to check the values of attributes to match against date or time patterns, but they already contain this logic to handle VCP. (there is some difference in error behavior though)

To note is that php-mf2, microformat-shiv, mf2-py and microformats-ruby already do this (microformats2-parser for Haskell and microformats for Go don't), and I didn't even notice that this wasn't supposed to work until @Zegnat pointed it out to me, despite me reading through the relevant spec parts a few times during last few days.

gRegorLove commented 7 years ago

Below is an example of php-mf2 doing this.

Pending any drawbacks to be raised, I'm in favor of updating the spec to make it match this behavior.

<span class="h-event vevent"><strong><span class="dt-start"><span class="value" title="May 31, 2017">2017-05-31</span> <span class="value" title="17:30">17:30</span></span>–<span class="dt-end">19:30</span> (local time):</strong><br><strong class="p-name summary"><a href="https://indieweb.org/events/2017-05-31-homebrew-website-club" class="u-url">Homebrew Website Club Meetup</a></strong><br>Where: <span class="p-location location">San Francisco</span>, <span class="p-location location">Berlin</span>, <span class="p-location location">London</span>, <span class="p-location location">Baltimore</span>, <span class="p-location location">Bellingham WA</span><br><span class="p-description"> Are you building your own website? Indie reader? Personal publishing web app? Or some other digital magic-cloud proxy? If so, come on by and join a gathering of people with like-minded interests. Bring your friends that want to start a personal web site. Exchange information, swap ideas, talk shop, help work on a project...</span></span>

{
    "items": [
        {
            "type": [
                "h-event"
            ],
            "properties": {
                "name": [
                    "Homebrew Website Club Meetup"
                ],
                "location": [
                    "San Francisco",
                    "Berlin",
                    "London",
                    "Baltimore",
                    "Bellingham WA"
                ],
                "description": [
                    "Are you building your own website? Indie reader? Personal publishing web app? Or some other digital magic-cloud proxy? If so, come on by and join a gathering of people with like-minded interests. Bring your friends that want to start a personal web site. Exchange information, swap ideas, talk shop, help work on a project..."
                ],
                "url": [
                    "https://indieweb.org/events/2017-05-31-homebrew-website-club"
                ],
                "start": [
                    "2017-05-31T17:30"
                ],
                "end": [
                    "2017-05-31T19:30"
                ]
            }
        }
    ],
    "rels": {},
    "debug": {
        "package": "https://packagist.org/packages/mf2/mf2",
        "version": "v0.3.2",
        "note": [
            "This output was generated from the php-mf2 library available at https://github.com/indieweb/php-mf2",
            "Please file any issues with the parser at https://github.com/indieweb/php-mf2/issues"
        ]
    }
}
Zegnat commented 7 years ago

I still struggle a bit with where the parser should take the implied extra data from. The specification says:

then it adopts the date of the most recently seen dt-* property with a date in that microformat.

Is there an established meaning for “that microformat”? Does it mean within the current h-*?

gRegorLove commented 7 years ago

My understanding is that while we commonly use "microformats" to refer to all the classes we add to HTML (roots and properties), the technical meaning of microformat is the root object. So h-event is the microformat and it can have certain properties. So yes, this means implying dt-* parts from within the current h-*.

Related: #4 helped me realize the implied portion can come after the current element in the HTML source. E.g. a dt-end with a tz offset where the dt-start does not have a tz offset.

sknebel commented 7 years ago

A consequence of this would be that it becomes impossible to specify a value that is just a time if there is another property that has a date, unless a way to override it is added as well. (Or in the context of #4, a a time without a timezone if other times have one)

For parsing values, a first suggestion would be using the patterns from vcp with relaxed whitespace rules (allow whitespace before and after elements, allow line breaks between elements, ...)?

Zegnat commented 7 years ago

A consequence of this would be that it becomes impossible to specify a value that is just a time if there is another property that has a date, unless a way to override it is added as well. (Or in the context of #4, a a time without a timezone if other times have one)

Only within the same h-*, if @gRegorLove’s understanding of the phrasing is right. I would even go as far as to say that you should only look at sibling properties, never imply from data found in child objects. This might have to branch of into a separate issue for clarification on where to imply from.

I am not sure if there is a use-case for a single object where this would be a problem given those limitations. You would require two datetime properties on the same object that are not allowed to imply anything about each other, in which case they may make more sense on separate objects to begin with?

Of course I might be missing a use-case.

I’ll also throw in something I wrote on the IRC channel:

is there some way we can generalise a vcp-to-string algo for dt- and then generalise a string-to-valid-timestamp algo that works on the string value of the dt-, then it no longer matters if that string value was obtained through regular parsing or through vcp.

Maybe we are making things harder on ourselves by defining special rules for what to include and what to discard during the vcp step, for just dt-*.

If there are rules for how to extract a normalised datetime from a string, this can also be applied outside of the vcp. Thus it will also give us access to separate date / time / tz information outside of vcp.

tantek commented 7 years ago

There is no "just a time" feature in mf2. dt-* properties are for datetimes (hence the "dt"). The TZ is optional however as there are use-cases of "floating" datetimes (local to whatever timezone).

Zegnat commented 7 years ago

There is no "just a time" feature in mf2. dt-* properties are for datetimes (hence the "dt").

The problem here is that dt- property parsing does not specify that at all. Only the vcp rules do. If I have a <span class="dt-end">18:00</span> this is only a time and parsers should return that according to the specification. As @sknebel noted in the original post, some parsers then decide to apply implied dates and some do not, implying dates is actually breaking the specification at this point.

That’s one of the reasons I am arguing for taking the datetime parsing rules out of vcp and into the dt-* parsing.