jupyter-book / mystmd

Command line tools for working with MyST Markdown.
https://mystmd.org/guide
MIT License
191 stars 61 forks source link

📅 Fix date representation #1470

Closed agoose77 closed 2 weeks ago

agoose77 commented 3 weeks ago

Our existing handling of dates parsing and rendering is vulnerable to some quirks (bugs) in different timezones. Internally we assume that we represent dates in UTC. Any parsing logic that breaks this assumption, such as non-ISO 8601 strings (which parse in the local timezone), leads to formatting errors down the road.

  1. ISO 8601 date string (in UTC+1):
    > process.env.TZ = "Europe/London"
    > new Date("2024-06-06")
    2024-06-06T00:00:00.000Z
  2. Non-ISO 8601 date string (in UTC+1):
    > process.env.TZ = "Europe/London"
    > new Date("6 Jun 2024")
    2024-06-05T23:00:00.000Z

    The YAML spec supports ISO 8601 date strings, which are parsed (without using the local timezone) into Date objects. But strings of the form 6 Jun 2024 are handled by new Date directly, which causes trouble!

To further complicate things, it it not possible to identify the providence of a Date object, i.e.

> new Date("2024-06-06T00:00:00+03:00")
2024-06-05T21:00:00.000Z

looks identical to

> new Date("2024-06-05T21:00:00.000Z")
2024-06-05T21:00:00.000Z

This PR proposes:

  1. Represent dates in validated frontmatter as ISO 8601 date-strings, i.e. YYYY-MM-DD.
  2. Never parse a date string in the local timezone
  3. Change from non UTC timezones to UTC using conversion

These changes ensure that we never use the local timezone in date parsing, and we always have valid (UTC) ISO 8601 date strings.

Fixes #1357

changeset-bot[bot] commented 3 weeks ago

🦋 Changeset detected

Latest commit: 794c7b0e20b7686d0f61b37d07e21c8c2ae9b475

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages | Name | Type | | ----------------- | ----- | | simple-validators | Minor | | myst-frontmatter | Patch | | mystmd | Patch | | myst-common | Patch | | myst-config | Patch | | myst-spec-ext | Patch | | myst-cli | Patch |

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

agoose77 commented 3 weeks ago

This PR introduces a new problem; moment.js complains about non RFC 2822/ISO 8601 formatted strings.

@rowanc1 all of this work leads me to think we should be stricter here and not accept under-specified dates.

My original approach here was to drop time information altogether, so that we can reasonably talk about "plain dates" (e.g. https://tc39.es/proposal-temporal/docs/plaindate.html). By adopting ISO 8601 dates, we can communicate this concept.

However, there's still ambiguity about what users are inputting. I think the solution might be to choose a small set of valid dates, i.e. ISO 8601 and a [ day-of-week "," ] date scheme taken from RFC 2822 (DAY MONTH YEAR).

agoose77 commented 3 weeks ago

Following a meeting, we discussed next steps:

agoose77 commented 3 weeks ago

@rowanc1 one of the quirks of these changes is that we can't detect timestamps in YAML frontmatter, because the YAML loader implicitly casts these to Date objects. This is not required by the YAML spec; it's an application-specific thing, and Python's PyYaml library does not do this.

We can build our own schema to turn this off pretty easily: https://github.com/nodeca/js-yaml/blob/0d3ca7a27b03a6c974790a30a89e456007d62976/lib/schema/default.js#L13

import yaml from 'js-yaml';

const schema = yaml.CORE_SCHEMA.extend({
  implicit: [yaml.types.merge], 
  explicit: [yaml.types.binary, yaml.types.omap, yaml.types.pairs, yaml.types.set]
})

const data = yaml.load(..., { schema })

What do you think? If we don't do this, then it means that we can't show a warning to users if they put time-stamps in the ISO 8601 date of their myst.yml (or other YAML frontmatter source).

rowanc1 commented 3 weeks ago

Ok - I think just a few tests that show the various parts of the parsing logic (including some invalid dates), and this is good to go!

agoose77 commented 3 weeks ago

@rowanc1 looks like we didn't handle null dates in the frontend (see https://github.com/jupyter-book/myst-theme/pull/451). This PR now passes-through null (like it used to) to signal "today", rather than encoding the time-of-build into frontmatter. Do you think that's wrong?

rowanc1 commented 3 weeks ago

I don't remember the null/today date being a feature previously, and left a comment on the null:

What is the use case for this? Can we send through "today" instead? null-ish to me means no date.

I can (maybe) see that a preprint/blog gets a today string, that is recognized by the CLI and then serialized to the frontend as the date it was built on. That is pretty dangerous though - it means all your blog posts are always written "today" whenever you rerender them (which is probably not what people want).

I don't know if I understand the use case where we want the frontend to show today's date whenever you refresh .. ?

I would opt to remove that and constrain to valid dates.

agoose77 commented 3 weeks ago

Yes, I think the semantics of 'today' should be to encode the authorship date.

I propose that no date value in frontmatter means that we generate the current date and store it. If the user sets date to null, then the frontend doesn't display a date.

Do you think that sits well with how people are already using myst?

agoose77 commented 3 weeks ago

I reread your last - rebuilds changing metadata is a serious thing to avoid, so we should make it hard to do that.

As such, I think this PR is correct - and we just need to continue hiding null values in the frontend.

rowanc1 commented 3 weeks ago

I do not think we should auto-magic the date in.

I am coming at this from (1) documentation - the default is not to have a date in the title, if you want it you should put it in explicitly; (2) scientific articles - the date is meaningful and should be put in intentionally, having the date be auto-magic-ed in means that we don't have that information in the YAML, it is not committed to git, and difficult to ensure at a check level that it has actually been input.

rowanc1 commented 3 weeks ago

rebuilds changing metadata is a serious thing to avoid

Yes - that is very succinct way to put it. :)

agoose77 commented 3 weeks ago

@rowanc1 this is now done imo.

rowanc1 commented 3 weeks ago

Will take a final look in a few hours and merge it in!

rowanc1 commented 2 weeks ago

Minor updates to cover a few more cases (e.g. 2023-02-29 is not valid!) and less technical language on the error messages.