time-link / timelink-kleio

Provides translation of files in Kleio notation into XML and other formats. Part of Timelink.
1 stars 0 forks source link

TEP - New format for dates #1

Open joaquimrcarvalho opened 2 years ago

joaquimrcarvalho commented 2 years ago

New format for dates

Preambule

Currently MHK deals with dates in a simplistic way.

Dates are represented internally as YYYYMMDD strings.

In some Kleio formats for acts, the date can be input in separate positional fields, DD/MM/YYYY, which the Kleio translator transforms to YYYYMMDD.

Dates in atributes and relations are always input as YYYYMMDD.

MHK tries to make the format more readable by formating YYYYMMDD as YYYY-MM-DD on the web frontend.

Partial dates are recorded using 00 for the missing information: 15820000, 15820500.

The rationale for this simplistic format was that a normal string sort on those date would provide an ordered chronological list.

Limitations:

Abstract

In order to allow more complex transcription of dates, a special format for date values is introduced.

Requirements:

Examples

Alternative syntax for periods and "after/before" is less readable but sorts alphabetically keeping chronological order.

Specification

Grammar:

date_expression --> single_date
date_expression --> date_range

single_date  --> uncertain_date
single_date  --> certain_date

uncertain_date  -->  date, '?'
certain_date  --> date

date --> fixed_date
date --> relative_date

fixed_date --> year
fixed_date --> year, '-', month
fixed_date --> year, '-', month, '-', day

relative_date --> after_date
relative_date --> before_date

after_date     --> '>', fixed_date
before_date  --> '<', fixed_date

date_range --> date, ':', date
date_range --> date, ':' (open at end)
date_range --> ':', date (open at start)

Implementation

Kleio translator must parse the dates. For this it needs to introspect the Kleio schema and detect elements that represent dates. This in turn means Kleio str files need to have a base date element from which concrete elements representing date inherit. This technique is already used to detect elements that represent days, months and years in acts.

Export data in xml must have extra fields for representing the data structure above.

The database representation must also be extended with extra columns, while keeping the current the_date column for compatibility purposes.

Backwards compatibility

The Kleio translator would also produce a simple YYYYMMDD date for backward compatibility.

joaquimrcarvalho commented 2 years ago

For the database concepts around time varying attributes see https://en.wikipedia.org/wiki/Temporal_database with an interesting example. For implementation see https://www.postgresql.org/docs/9.2/rangetypes.html

joaquimrcarvalho commented 2 years ago

Similar approach in Signore Oreste, Bartoli Rigoletto, Fresta Giuseppe, Marchetti Andrea, Issues on historical geography, Proceedings of ICHIM'97 - Fourth International Conference on Hypermedia and InterActivity in Museums - Paris, France, 3-5 September, 1997 p.252-257 (Archives & Museum Informatics, 1997) http://www.archimuse.com/publishing/ichim97/bartoli.pdf p.

joaquimrcarvalho commented 1 year ago

For an example showing how the unavailability of relative dates creates an erroneous vision see https://timelink.uc.pt/mhk/china/id/deh-albert-le-comte-dorville which contains a few attributes for which no exact date is known but a relative date is (after 1661-04-13, before 1663-04-08).

However the specs above do not solve the situation of several relative dates in succession, such as this example of an itinerary of sorts: first location we know is after 1661-04-13, second is after the first, third after the second,...

joaquimrcarvalho commented 1 year ago

However the specs above do not solve the situation of several relative dates in succession, such as this example of an itinerary of sorts: first location we now is after 1661-04-13, second is after the first, third after the second,...

This could be handled with an extra construct in dates, indicating order within a date:

single_date --> uncertain_date single_date --> certain_date single_date --> ordered_date

uncertain_date --> date, '?' certain_date --> date ordered_date --> date,'^', N N ->1,2,3...

        ls$estadia/Sian fou, Shensi/<16610413#até
        ls$estadia/Chamo/>16610413^1
        ls$estadia/Tibete/>16610413^2
        ls$estadia/Lhasa, Tibete#2 meses/>16610413^3
        ls$estadia/Nepal/>16610413^4
        ls$estadia/Bengala/>16610413^5
        ls$estadia/Benares/>16610413^6

Internally the Kleio translator would generate a meta-relation between attributes:

Atributes:

id entity type value date  obs
deh-albert-le-comte-dorville-att553-83 Deh-albert-le-comte-dorville Estadia Sian Fou, Shensi <16610413 até.  
deh-albert-le-comte-dorville-att554-83 Deh-albert-le-comte-dorville Estadia Chamo >16610413    
deh-albert-le-comte-dorville-att555-83 Deh-albert-le-comte-dorville Estadia Tibete >16610413    
deh-albert-le-comte-dorville-att556-83 Deh-albert-le-comte-dorville Estadia Lhasa, Tibete >16610413 2 Meses.  
deh-albert-le-comte-dorville-att557-83 Deh-albert-le-comte-dorville Estadia Nepal >16610413    
deh-albert-le-comte-dorville-att558-83 Deh-albert-le-comte-dorville Estadia Bengala >16610413    
deh-albert-le-comte-dorville-att559-83 Deh-albert-le-comte-dorville Estadia Benares >16610413    

Relations between attributes (auto generated from the notation above):

origin destination type value
deh-albert-le-comte-dorville-att555-83 deh-albert-le-comte-dorville-att554-83 time after
deh-albert-le-comte-dorville-att556-83 deh-albert-le-comte-dorville-att555-83 time after
deh-albert-le-comte-dorville-att557-83 deh-albert-le-comte-dorville-att556-83 time after
deh-albert-le-comte-dorville-att558-83 deh-albert-le-comte-dorville-att557-83 time after
deh-albert-le-comte-dorville-att559-83 deh-albert-le-comte-dorville-att558-83 time after

Internally the kleio parser makes a note of the attribute id when it encounters a date in the form of Date^N. If N > 1 the relation is generated. If N = 1 the stored ids are cleaned. An error is generated if two successive dates do not have sequential N.

joaquimrcarvalho commented 1 year ago

This problem also exists in RDF ontologies notations. See. Krieger, H.-U., & Declerck, T. (n.d.). An OWL Ontology for Biographical Knowledge. Representing Time-Dependent Factual Knowledge https://ceur-ws.org/Vol-1399/paper16.pdf .

joaquimrcarvalho commented 1 year ago

See Snodgrass, R. T. (2000). Developing time-oriented database applications in SQL. Morgan Kaufmann Publishers; Kindle JRC.

Introduces the concept of "valid time" which is the period in which an information is valid.

The author makes the distinction between information that is always valid, like a birthdate, or place of birth, and information which is valid during certain period of time: address, profession, age. The same applies to relationships:"son of" is a relation always valid but "married to" or "friend of" are valid for specific periods only.

"This difference between the semantics of the BIRTH_DATE column and the other timestamp columns has far-ranging consequences and is the primary impetus for this entire book."

"Such a table [with start and end dates] is called a valid-time table. This table records the history of the modeled reality. The original table, without temporal support, is termed a snapshot table, as it logically captures the state of the enterprise at a single point in time, much as a photographic snapshot does."

joaquimrcarvalho commented 1 year ago

See in Sqlchemy support for PostgreSQL ranges, including dates and time stamp : https://docs.sqlalchemy.org/en/20/dialects/postgresql.html#range-and-multirange-types