Proposals for modeling dates

rjyounes commented 4 years ago

At the 8/6/20 gist council, Michael presented a discussion of issues around and various options for date modeling. There was a call for proposals, and those proposals are attached here.

rjyounes commented 4 years ago

Proposal from Ted Hills attached. TimeIntervalsInGist.pdf

rjyounes commented 4 years ago

From Peter Winstanley:

I'm sorry I missed what must have been a great discussion.

I don't know if I'm repeating anything here, but I want to direct attention to the notion of having a service and some URI based approach to describe calendars (we use many) and instants where the dereferencing through a service allows the agent to discover the adjacent segments as well as what is within a period (perhaps in many calendar frames of reference).

See https://www.epimorphics.com/using-interval-set-uris-in-statistical-data/

For a quick example

https://reference.data.gov.uk/id/quarter/2006-Q1

Peter

rjyounes commented 4 years ago

Response to Ted Hills's document from John Cowan:

Anyway, I want to start with this passage from p. 4:

Consider the following W3C (ISO 8601) representations of the first instant of the year 2020:

to the day: 2020-01-01 to the minute: 2020-01-01T00:00Z to the second: 2020-01-01T00:00:00Z to the millisecond: 2020-01-01T00:00:00.000Z to the nanosecond: 2020-01-01T00:00:00.000000000Z

Each of these strings specifies the same time instant that started the year 2020. They only vary by the degree of resolution to which they specify that instant.

Now it seems to me that this is simply, flatly false; these values do not represent the same time at all. If I assert "I was born on 1958-07-02", that is a truth. But it is not a truth to assert the supposedly equivalent claim "I was born at 1958-07-02T00:00:00Z." Far from it. On the other hand it would be a truth to assert "I was born at 1958-07-02T19:20Z", or at least that's what my mother told me, and the two higher-resolution claims can't possibly both be true.

From this I conclude that a TImeInstant is not a mathematical, or even physical, point in time at all, still less an interval in time (that is, an amount of time between two points). Rather, it is what I will call an "atom of time", a time period that we choose to treat as having no parts for our present purposes. (In physics, so-called atoms have been discovered to have parts, but you can still do a lot of physics and even more chemistry even if you pretend they don't.)

On this view, I was born at the atom of time labeled "1958-07-02". I was also born at the atom of time labeled "1958-07-02T19:20Z", and the atom labeled "1958", and however many other atoms I choose to specify. And these statements are consistent, whereas a claim that I was born at the atom "1959" would not be.

On this view likewise, the XSD datatypes dateTime, date, gYear, and gYearMonth specify atoms of time of particular sizes. The datatypes time, gMonthDay, gMonth, and gDay represent unbounded sequences of atoms of time, all of the same size. And xsd:duration? It represents not a mathematical interval between mathematical points, but the temporal distance between two atoms of time, not necessarily of the same size.

Now I think that formally your proposal and mine (if I had one) would look similar or even the same. But the semantics of it would be quite different, because it matters a great deal what the resolution (what I called granularity during the meeting) of an atom of time is.

rjyounes commented 4 years ago

Response to John Cowan from Dave McComb:

in general I agree with this, our mechanism to represent it in gist is to have a precision attribute on the instant so you can know is it precise to a day (calendar precision) or a second (human precision) or to a milli second (system precision)

rjyounes commented 4 years ago

Response to John Cowan from Ted Hills:

Hi John,

Thanks for your feedback. Firstly, we agree on resolution/granularity. But from that point we diverge, and I see this as an issue of the philosophy of semantics. One issue that needs resolving, as I see it, is the difference between what we mean when we speak and how we record that meaning in a knowledge graph.

When I say, “I was born in 1958”, the listener understands that I am identifying a time interval that began on 1 January 1958 at midnight and ended just before midnight on 1 January 1959. When I say, “I was born on July 2, 1958, at 7:20 PM UTC”, I am identifying a time interval with a duration of a minute. We agree.

But how do you record this information in a way over which computations can be made? You may, if you wish, restrict your facility of expression in a knowledge graph to identifying time intervals whose duration is derived from the resolution of the last digit of the xsd:dateTime string, but why is this good? It is certainly limiting. For example, how do I express the time interval of the first 5 seconds of 1958? That began at 1958-01-01T00:00:00—but I don’t mean the whole year. So I need to supply an additional piece of information, either “5 seconds”, or the end time of 1958-01-01T00:00:05. Do you envision some alternative way to say “the first 5 seconds of 1958"?

As for “atoms of time”: please provide proof that they exist other than in our minds. They are notional, and therefore not absolute. You can convince me that they are a useful notion—but you haven’t convinced me yet. The analogy to physical atoms, which provably exist, is therefore not an appropriate one. The concept of a line consisting of points and intervals between points is also purely notional, but it has been found useful by thousands of mathematicians for centuries now, and is even useful for physicists, architects, engineers, builders, surveyors, . . . .

Please help me to understand your concept of atoms of time: What is the duration of time between Tuesday and Thursday of a given week?

There’s a phrase I love that, I think, captures the essence of this kind of debate. It is, “Ontology recapitulates philology.” This echoes the phrase, “Ontogeny recapitulates phylogeny,” which expresses the now-discredited notion that human fetal development passed through stages that mimicked or recapitulated evolutionary phases of development from lower-order animals to higher-order animals. Similarly, the idea is not creditable that our ontologies should do no more than represent the words we use. When we hear the statement, “I was born in 1958”, we are able to interpret it precisely only by drawing on a vast quantity of cultural and linguistic knowledge—knowledge that is not natively available to a computing system. We understand implicitly that a time interval of one year is meant. But, in my opinion, I think we do no service if we do not record that contextual information in a knowledge graph when we record the basic statement. I deal with this all the time in my natural-language processing work. A financial analyst might say, “share price increase” and “market share increase” in the same sentence. My whole job in NLP is to figure out how to represent, in a knowledge graph, the difference between shares of stock and market share, especially when the words used overlap so much. If I simply developed a way to represent the words directly, without representing their different meanings, I would be guilty of recapitulating the philology in the ontology rather than representing the meaning. Therefore I don’t think the argument is a good one that the representation of time in a knowledge graph should exactly parallel how we speak, when such a representation makes it difficult to represent arbitrary time intervals. When our ontologies can parallel our speech, that is good, and in fact that is a good goal to achieve when possible. But when our language patterns convey information beyond the literal words, I think the duty of the ontologist is to record this implicit information as well, to ensure that it is available for computation when the context has been lost.

rjyounes commented 4 years ago

Diagram from Bhoomin Pandya at Concept Miners:

Time Model

johnwcowan commented 4 years ago

When I say, “I was born in 1958”, the listener understands that I am identifying a time interval that began on 1 January 1958 at midnight and ended just before midnight on 1 January 1959. When I say, “I was born on July 2, 1958, at 7:20 PM UTC”, I am identifying a time interval with a duration of a minute. We agree.

I don't think we do agree. I don't think "1958" in "I was born in 1958" represents an interval at all. It represents a specific time with a granularity of one year. It's true that extensionally "I was born between 1958-01-01 and 1958-12-31 inclusive" means the same thing as "I was born in 1958", but intensionally it is quite different, just as "The number of planets is eight" is quite different intensionally from "The number of planets is the number of planets."

For example, how do I express the time interval of the first 5 seconds of 1958? That began at 1958-01-01T00:00:00—but I don’t mean the whole year.

I don' t think you do when talking about granular moments of time. We simply don't divide a year into 6,307,200 moments (or 6,324.480 in leap year), each five seconds long. We do have moments with a granularity of a week, like "I was born in week 27 of 1958" (or "1958W27" in ISO notation), and the VMS operating system granulated time into 100-ns clunks starting at midnight on Modified Julian Day 0 ("1858-11-17"). But though we can use any granularity we like, in practice we use only a few.

On the other hand, when we are talking about an interval, then indeed we can and do say that the interval in question began at 1958-01-01T00:00:00 and ended at 1958-01-01T00:00:05. Each endpoint has a granularity of 1 second, and the duration of this interval is therefore 5 seconds, or "PT5S" in ISO notation.

As for “atoms of time”: please provide proof that they exist other than in our minds. They are notional, and therefore not absolute.

Absolutely they are notional.

You can convince me that they are a useful notion—but you haven’t convinced me yet. The analogy to physical atoms, which provably exist, is therefore not an appropriate one

No, and I shouldn't have brought them up. By atom I simply mean something that has for present purposes no parts, an ἄτομος, an individual. English has no word for these that is generic over their size: we can talk of the days or the months or the years of my life so far, but we have no term that encapsulates all these things.

I have been using "moment", even though it is a violation of ordinary language to say that a year is a moment (though Christians are told that one day is like a thousand years with God), simply because it's the best term I have thought of so far.

And I agree that the standard sizes of moments are purely conventional, but they are the ones we use, and I think the idea of granular time is not conventional but universal.

The concept of a line consisting of points and intervals between points is also purely notional, but it has been found useful by thousands of mathematicians for centuries now,

Absolutely.

and is even useful for physicists, architects, engineers, builders, surveyors, . . . .

But only if they take into account granularity. A scientist's or engineer's "point" is emphatically not a mathematician's point. As Charles Sanders Peirce said, if you ask a scientist what the temperature is, they don't answer you directly: they say that it is measured to be somewhere in a certain region of the temperature scale, to the nearest degree or tenth or ten-millionth of a degree C. But they are not giving you an interval when you asked for a particular value; they are just telling you the granularity of that value.

Please help me to understand your concept of atoms of time: What is the duration of time between Tuesday and Thursday of a given week?

"Two days" in plain English, or "P2D" in ISO. Attempting to be more (or less) precise than that would be Wrong.

There’s a phrase I love that, I think, captures the essence of this kind of debate. It is, “Ontology recapitulates philology.” [...] Similarly, the idea is not creditable that our ontologies should do no more than represent the words we use.

I on the other hand think exactly that: ontologies are a formal method for manipulating clarified ordinary-language concepts, not a collection of already-formalized concepts.

When we hear the statement, “I was born in 1958”, we are able to interpret it precisely only by drawing on a vast quantity of cultural and linguistic knowledge—knowledge that is not natively available to a computing system. We understand implicitly that a time interval of one year is meant. But, in my opinion, I think we do no service if we do not record that contextual information in a knowledge graph when we record the basic statement.

I agree: when we record an atom or moment of time, we should give its granularity explicitly as millennium, century, year, month, fortnight, week, second, various sizes of subseconds. Other granularities are useful too: the Roman week was eight days before they adopted the Jewish seven-day week, for example. We can use either concepts with their own URIs or XSD durations to talk about granularities; it doesn't matter.

My whole job in NLP is to figure out how to represent, in a knowledge graph, the difference between shares of stock and market share, especially when the words used overlap so much. If I simply developed a way to represent the words directly, without representing their different meanings, I would be guilty of recapitulating the philology in the ontology rather than representing the meaning.

In that sense I agree.

Therefore I don’t think the argument is a good one that the representation of time in a knowledge graph should exactly parallel how we speak, when such a representation makes it difficult to represent arbitrary time intervals.

But it doesn't. Non-mathematical time intervals aren't equivalently two mathematical points in time or a point in time plus a mathematically precise duration. They are a start moment (with a granularity) and an end moment (with a perhaps different granularity). If we speak of the interval representing the life of an option, for example, the time when the option was written may be known to the millisecond or better, but the moment when it ends is conventionally a particular second: 5 PM of a particular day in the future.

I hope all that is helpful.

tedhills commented 4 years ago

It's true that extensionally "I was born between 1958-01-01 and 1958-12-31 inclusive" means the same thing as "I was born in 1958", but intensionally it is quite different

I cannot see how the extension and intension are different here.

tedhills commented 4 years ago

if you ask a scientist what the temperature is, they don't answer you directly: they say that it is measured to be somewhere in a certain region of the temperature scale, to the nearest degree or tenth or ten-millionth of a degree C. But they are not giving you an interval when you asked for a particular value; they are just telling you the granularity of that value.

I don't think this is accurate. It confuses precision and resolution. The scientist will inform you of the degree to which he was able to reduce experimental error in making a measurement. That is not the granularity of the measurement; it is the degree of accuracy of the measurement.

Some measurable things have granules, for instance, fundamental particle spins come in units of +/- 1/2, and quark charges come in units of +/- 1/3. But most physical phenomena, like temperature and time, don't have granules as far as we know. The degree of accuracy to which we can measure these things does not give them granules.

If we choose a granule when we speak, such as a day, hour, or second, that's fine, but that's not a result of a limit on our ability to measure. It's just an agreed-upon unit of specification.

If I say, "tomorrow", I am not +/- twelve hours uncertain of when that is. I am also not concerned about measuring the beginning and ending of the interval which is "tomorrow" to any degree of accuracy. Rather, I am identifying "tomorrow" as a time interval understood by my listener and me as a 24-hour period beginning at midnight (ignoring time zone issues). And I don't expect my listener to run to his atomic clock and sit up until midnight to observe when tomorrow begins. So when I say "tomorrow", any mention of precision, accuracy, or measurement is irrelevant. I can accept that the granule of tomorrow is "1 day". However, based on this intuitive understanding, I don't see how any person or system is helped by identifying "tomorrow" by noon +/- 12 hours, when the interval beginning at midnight is what is meant. You can do that if you want, but I think I've already outlined how difficult that convention makes it to identify something like a month or a year. It also goes against the grain of many data systems--and therefore many petabytes of data--which identify the beginning of any arbitrary time interval by its first moment, to some degree of resolution (reminding you once again that resolution of specification and accuracy of measurement are two unrelated things).

johnwcowan commented 4 years ago

They differ in intension because one is a range of moments, as shown by

It's true that extensionally "I was born between 1958-01-01 and 1958-12-31 inclusive" means the same thing as "I was born in 1958", but intensionally it is quite different

I cannot see how the extension and intension are different here.

One is a range of moments specified by start and end points that happen to be a year apart; the other is a moment that endures for a year. That is a difference of intention, even though the truth conditions are the same.

tedhills commented 4 years ago

Ah, now I see that the difference is in our perspective of the time line.

If I am deducing correctly, you see the time line as consisting of discrete moments of fixed duration which can overlap and be added together, but which ultimately have a finite granularity that can be counted. Thus, a year can be described as a range of discrete moments of undefined duration whose combined duration adds up to the duration of a year, or a single discrete moment whose duration is a year. You see the phrase “between 1958-01-01 and 1958-12-31 inclusive” as referring to a collection of moments (of undefined duration) whose total duration is a year, and the phrase “in 1958” as referring to a single moment of duration one year.

In contrast, I see the time line as analogous to the real number line. It is infinitely divisible and not composed of any measurable granules. (This is physically true, as far as we know.) We can pick any two points on the line and define a duration as the time elapsed between the two points. The points themselves are of zero duration and are best referred to as “instants”. The word “moment” is often used as a synonym for “instant”, but connotes some finite duration so is not so good to use. In this view of the world, we would never say that one duration is composed of other durations. We would, however, say that some durations are “equal”. For instance, two hours is equal in duration to 120 minutes, but it is not correct to say that two hours are composed of 120 minutes, any more than it is correct to say that twelve is composed of 6 and 6 (or 7 and 5 or 9 and 3 or . . .). Thus, in my view of the world, the phrase “between 1958-01-01 and 1958-12-31 inclusive” (or, to be more precise, “from 1958-01-01 up to but not including 1959-01-01” does not denote any sequence of moments. Rather, it identifies two points on the time line of zero length, where the extent of the duration between them is one year. The phrase “in 1958” connotes and denotes exactly the same thing, by definition.

Which to choose? One looks for pragmatics. If I have to calculate the middle instant of a year (somewhere around noon on July 3 of a normal year or midnight in a leap year) in order to record the identity of that year in a data system, because that’s how one identifies a year, I am going to very quickly move to a solution that identifies a year by two instants or by an instant plus a duration. Having worked in this field for decades, I have processed data from many systems, and put data into many systems, by this latter convention, and found the results to be, not only more inter-operable with existing systems, but also more intuitive, and insulated from the “tick size” of system clocks (which invariably kept getting smaller until quite recently).

All the best, Ted Hills Concepts and Objects LLC +1 908 200 8713 thills@acm.org

On Oct 14, 2020, at 1:50 PM, John Cowan notifications@github.com wrote:

They differ in intension because one is a range of moments, as shown by

It's true that extensionally "I was born between 1958-01-01 and 1958-12-31 inclusive" means the same thing as "I was born in 1958", but intensionally it is quite different

I cannot see how the extension and intension are different here.

One is a range of moments specified by start and end points that happen to be a year apart; the other is a moment that endures for a year. That is a difference of intention, even though the truth conditions are the same.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/semanticarts/gist/issues/340#issuecomment-708560912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFKJ7YBC2YNHJO7ZP2GCRPTSKXQGBANCNFSM4P6O2RLA.

johnwcowan commented 4 years ago

Ah, now I see that the difference is in our perspective of the time line. If I am deducing correctly, you see the time line as consisting of discrete moments of fixed duration which can overlap and be added together, but which ultimately have a finite granularity that can be counted.

Exactly.

In contrast, I see the time line as analogous to the real number line. It is infinitely divisible and not composed of any measurable granules. (This is physically true, as far as we know.)

Well, no. The shortest physically meaningful moment is the Planck time, approximately 10^-43 seconds. But in any case, clocks in the real world tick, and that means we and our computers cannot deal with infinitely divisible time any more than we can deal with arbitrary real numbers. The battery-powered clock on my wall is ticking seconds (but is only accurate to a matter of minutes); the NTP protocol ticks at 2^-32 seconds (about 233 picoseconds, though it is only accurate to a millisecond or so), and the mechanical Clock of the Long Now actually does tick yearly, and is expected to keep going with proper maintenance for the next 10,000 years.

rjyounes commented 3 years ago

semanticarts / gist

Proposals for modeling dates #340