mathesar-foundation / mathesar

Web application providing an intuitive user experience to databases.
https://mathesar.org/
GNU General Public License v3.0
2.41k stars 332 forks source link

Spec for human readable date and time parsing #845

Closed kgodey closed 2 years ago

kgodey commented 2 years ago

Problem

We'd like to support parsing human readable dates and times when users are interacting with data with date/time types. For example, a user might want to filter on a column "within the next 3 months" or "2 hours ago". However, we don't have a clear spec for what formats to support and how to make sure parsing is consistent across the frontend and backend.

Proposed solution

We should write a spec for what formats we will support and how to make sure it's consistent between the frontend and backend. We want frontend parsing so that we can validate user input without being slowed down by a round trip to the backend. We also want backend parsing so that we can parse data in bulk imports.

The desired output is a spec on the wiki's Architecture section that outlines a solution that works for both frontend and backend. Once the spec is ready, we will create new issues to implement it.

Additional context

mathemancer commented 2 years ago

Here's the docs page with all info about how PostgreSQL parses dates and times: https://www.postgresql.org/docs/13/datetime-appendix.html

Here's the page with info about date/time types: https://www.postgresql.org/docs/13/datatype-datetime.html

dmos62 commented 2 years ago

So far we've been talking about human-language (human-readable is a bit of a misnomer, since a human can read ISO8601) relative time (e.g. 5 hours ago) and relative period (e.g. within last 5 hours) specification.

It's good for the user to get a lot of feedback about what implications we're making about his specification. I'm sometimes surprised by a device defaulting to weeks starting on Sundays (relevant when parsing "since last week" for example). Similarly, it's good to communicate to a user to what type (is it a date, a time, a period, a relative time, a relative period) his specifications will be parsed to.

The most holistic approach might be to introduce new types for relative time/period specifications: "5 minutes ago" specifies a point in time relative to another unspecified point in time: so we save it as such (relative point in time) and let the user cast it to an absolute point in time if he wants to (he might not want to); if he does cast he can then specify the absolute point in time required to turn a relative point in time to an absolute.

mathemancer commented 2 years ago

You've pointed out a number of things that we will need to specify.

One quick note about:

The most holistic approach might be to introduce new types for relative time/period specifications: "5 minutes ago" specifies a point in time relative to another unspecified point in time: so we save it as such (relative point in time) and let the user cast it to an absolute point in time if he wants to (he might not want to); if he does cast he can then specify the absolute point in time required to turn a relative point in time to an absolute.

I don't think the relative time has meaning beyond the INTERVAL type without the absolute time it's measured from. I.e., "5 minutes ago" is an interval "-5 minutes" summed with the current time. If we're not going to realize the absolute time, it's just an interval.

dmos62 commented 2 years ago

If we're not going to realize the absolute time, it's just an interval.

Good point. I didn't realise INTERVAL can be negative.

mathemancer commented 2 years ago

After fiddling / struggling with the INTERVAL type quite a bit, my preferred option is to use a string conforming to ISO8601 as the canonical representation for all date, time, datetime (timestamp), and duration (interval) types. When querying / selecting, this would be what's returned by the API. For inserts, the front end could either send along the string as the user inputs it, or optionally do some validation and then submit either the original string or a modified version conforming to ISO8601.

Questions:

Ideas I tried / partially implemented, then discarded:

seancolsen commented 2 years ago

@mathemancer

Are you okay with modifying and prettifying that format for display to the user?

Just to make sure I understand your question correctly, you're proposing that we standardize the API to always transfer date, time, date-time, and duration data as a string in conformance with ISO-8601. Correct? This would be opposed to, say, transferring a date as a unix timestamp number or as a formatted string like "Jan 27, 2022". If I'm understanding correctly, then yes! That sounds great!

mathemancer commented 2 years ago

@seancolsen Yes, that's what I'm proposing. Note that the ISO8601 format is pretty ugly by default.

Edit: The API will accept non-conforming strings and use the default PostgreSQL functionality to attempt to understand them (the PostgreSQL implementation is quite featureful). The returned value will always be ISO8601 conforming.

ghislaineguerin commented 2 years ago

@mathemancer I think it’s ok to have it later unless it creates inconsistencies for the user. If when we introduce it later we break things then it might be better to have it as part of this. I mean for the week date system.

mathemancer commented 2 years ago

@mathemancer I think it’s ok to have it later unless it creates inconsistencies for the user. If when we introduce it later we break things then it might be better to have it as part of this. I mean for the week date system.

It wouldn't break anything to introduce later. It would be strictly additive. The reason I want to hold off is that only a few countries care about week numbers, and it adds a bunch of pain to the implementation. However, it is a feature that Germans using the product would probably expect.

kgodey commented 2 years ago

I think it's fine to use ISO8601 representations in the API and rely on the frontend to do date/time parsing when interacting with the API.

We will still need to parse dates, times, and durations in the backend while doing imports. @mathemancer, it sounds like we're going to rely on default Postgres parsing here. Can you document what formats it supports or link to it when you write the spec? We should make sure that the frontend parsing is the same so that users can expect the same behavior however they enter dates.

It's fine to defer support for the ISO week date system, please create a ticket to track the work and put it in the future consideration milestone.

pavish commented 2 years ago

I also agree that it is better to always use one particular format, in this case ISO8601, for data returned by our APIs.

The frontend can format it for display however required.

mathemancer commented 2 years ago

@mathemancer, it sounds like we're going to rely on default Postgres parsing here. Can you document what formats it supports or link to it when you write the spec?

Sure thing.

kgodey commented 2 years ago

@mathemancer Please close this once you're done with the spec updates.

kgodey commented 2 years ago

This was resolved in https://github.com/centerofci/mathesar-wiki/pull/40