popolo-project / popolo-spec

International legislative data specifications
http://www.popoloproject.com/
99 stars 18 forks source link

reduced dates #7

Closed jpmckinney closed 9 years ago

jpmckinney commented 11 years ago

PopIt uses MongoDB's native Date objects, so it cannot use ISO 8601:2004 reduced dates.

evdb commented 11 years ago

The approach in PopIt is (was :) ) to treat all dates as a range. This works well in that it can represent a specific date like 2 Mar 2012 (by using a range from midnight to the following midnight). It can also represent arbitrary ranges like 1 Jan 2012 to 30 April 2012.

However it has the following problems:

But the following advantages:

jpmckinney commented 11 years ago

With respect to the advantages:

  1. Does that arbitrary constraint come up in practice? How big a disadvantage is that constraint?
  2. Can you provide an example where the ISO reduced dates do not give the sort order you would expect, when sorting their original values as strings? All I can tell is that you need to do some magic for your sentinel value "future" and null values to be properly sorted.
  3. Linking to #20
evdb commented 11 years ago

Does that arbitrary constraint come up in practice? How big a disadvantage is that constraint?

It has not, but it is realistic to expect that it will. Without this it won't be possible to represent "Q2 2012" or "late 2012" more accurately than just "2012". It would also not be possible to represent a particular week of the year if that week spanned two months. And using the ISO 8601 format of 2012-W12 would not sort correctly either.

For the Kenyan elections we would have been able to give a very specific range that the election should have occurred in according to the legislation.

Can you provide an example where the ISO reduced dates do not give the sort order you would expect, when sorting their original values as strings?

Ascending sorts tend to be fine, the problem is that when you do descending sorts with partial dates the expected result is not just the reverse of the ascending. Take these dates as an example:

ascending: 2001, 2001-03, 2001-03-15, 2001-04
descending: 2001, 2001-04, 2001-03, 2001-03-15

This sorting order could be contentious, but it is what we found to be the expected order when displaying the dates to the user. Having each partial date contain a definite high and low date for the range made sorting easier. Notably the Extended Date/Time Format (EDTF) 1.0 spec dodges the question completely.

All I can tell is that you need to do some magic for your sentinel value "future" and null values to be properly sorted.

That is a little messy, but using 'future' to distinguish between not known and not happened yet was useful. It is not deal though - if future is the end of the range it is a good way to say something like '15 Mar 2001 onwards', if future is the start of the range then its meaning is not clear.

jpmckinney commented 11 years ago
  1. ISO 8601 already has support for time intervals, e.g. 1985-04-12T23:20:50/1985-06-25T10:30:00. This is quite obviously awful for querying against. Having use cases (along the lines of your "test suite" from an earlier email) to lend support to why we would want to do certain types of queries would be great.
  2. Can we come up with a list of use cases in which we must sort by date, and in which the date is likely to be a reduced date? (If all use cases are ones where we are always have full dates, then sorting reduced dates is a non-issue). For these use cases, it would also be relevant to include an idea of how many records would be sorted. If it turns out that the number of records is small, then that sorting can be done by the app and not the DB. I see the "correct" sort order as being contentious - preferably our use cases can leave it up to each app.

Re: future let's have that discussion in #17

Update: Just for clarity:

Default reverse sort order on strings:

2001-04, 2001-03-15, 2001-03, 2001

Desired reverse sort order on string:

2001, 2001-04, 2001-03, 2001-03-15

i.e. sort by precision in normal order, then reverse sort by string all the strings with the same precision

jpmckinney commented 11 years ago

Btw, here is a US gov draft specification which may be more suitable than ISO 8601: http://www.loc.gov/standards/datetime/pre-submission.html

jpmckinney commented 9 years ago

Closing, as this has not come up again as an issue. We can re-open as necessary.

girogiro commented 5 years ago

Regarding descending order of date strings.

Default reverse sort order on strings:

2001-04, 2001-03-15, 2001-03, 2001

Desired reverse sort order on string:

2001, 2001-04, 2001-03, 2001-03-15

i.e. sort by precision in normal order, then reverse sort by string all the strings with the same precision

This is acually not true. If 2000 is added to the values above it's not sorted last where it should be but right after 2001.

A working SQL solution for descending order is ORDER BY CONCAT(value, 'T25') DESC that works also for combined date and time values. The same is easy to implement in app code.

Furthermore, I suggest to use and | as internal values for past and future, respectivelly. They are sorted correctly ( < any date < |) both for ascending and descending order and remind asterisk and dagger commonly used for birth date and death date.