icaruseu / mom-ca

Monasterium.net (http://www.monasterium.net/mom) - repository and collaborative archive
https://github.com/icaruseu/mom-ca/wiki
GNU General Public License v3.0
17 stars 11 forks source link

semantics of `cei:dateRange/@from|@to` is ambiguous #994

Open GVogeler opened 3 years ago

GVogeler commented 3 years ago

Plain interpretation of cei:dateRange/@from|@to would suggest, that the tag describes a time interval. Real usage is in MOM-CA is that @from|@to describe the limits of a possible date. In the TEI vocabulary this would better be expressed with @notBefore|@notAfter, which is currently not part of the schema and MOM-CA.

GVogeler commented 3 years ago

see #191

GVogeler commented 3 years ago

Possible solutions:

  1. change definition of @from|@to in schema
  2. rename all @from|@to to @notBefore|@notAfter and replace all occurrences of @from|@to in the code accordingly
  3. add @notBefore|@notAfter to the cei:date and cei:dateRange and modify code be able to take these attributes into account, when creating sorting dates.
GVogeler commented 3 years ago

Semantics of solution 2 fits to TEI and should be realistic work (search and replace in code base) it seems to be prefered. First task would be to check all occurrences of the two attributes in MOM-CA. @NTsch can you add here a list of them?

yngwi commented 3 years ago

Hi, I don't think this is a feasible step due to the following:

  1. To / From mean something different than not before / not after. While probably many charters could have their date range replaced with an exact value, be that the correct date or the date they must have been created before or after, there are many charters where only a general date range is currently known (or can be known) so replacing to / from would mean to have a look at each charter and make a scientific guess as to what would be the correct date. I don't think that is something the Mom team should do.
  2. It is long established that dates in MOM are either exact or a date range. Each change would have to be communicated to participating archives which could be difficult.
  3. It will be very hard to go over each charter with a query and do the proposed changes while making sure that the huge date inconsistencies in MOM are not further increased.

With regards to the semantics of cei:date/dateRange: considering the fact that before / after are not really a range but more a specific use of the exact date, how about adding an optional qualifier like cei:date@notBefore und cei:date@notAfter?

GVogeler commented 3 years ago

Yes, resigning the contract with the data producers when translating cei:dateRange/@from|@to into cei:date/@notBefore|@notAfter would not be a suitable way, and, indeed, we cannot check all data individually, but:

  1. Solution 3 will enhance complication in sorting order: we would have to deal with cei:date/@value|cei:dateRange/@to|cei:dateRange|@notAfter +@* lt 9999999|@*.
  2. I doubt that any archivist has used cei:issued/cei:dateRange in the sense of "It took three months to issue this charter", but understood the @from and @to as upper and lower boundary of possible dates. Clicking through some of the 33947 results of for $dateRange in collection('/db/mom-data/metadata.charter.public/')//cei:issued/cei:dateRange[not(@from/string()=@to/string())] return $dateRange seems to support my impression. And you probably know best the reality under which the data was created and inserted into Monasterium.net :-)
  3. Finally, the definition in the schema (Ein Zeitraum, die Werte des kleinsten und des größten zutreffenden Datums in den Attributen "from" und "to" (https://github.com/icaruseu/mom-ca/blob/master/my/XRX/src/mom/app/cei/xsd/cei.xsd#L4465-L4466) is ambiguous as well.
NTsch commented 3 years ago

@NTsch can you add here a list of them?

Regex dateRange\/@(from|to) returns: from_to_momca.txt

GVogeler commented 1 year ago

There is an urgent request to solve this issue by Dominique Stutzmann to insert his TEI based edition of Fontenay charters into Monasterium.net. I would suggest to handle this by the following:

  1. introduce @notBefore and @notAfter into the schema as attributes to date and dateRange to store the more precise data representation - but not using it in the code for sorting etc.
  2. keep the technical requirements of @value and @from and @to in the Monasterium specific format for sorting etc.

I would then suggest the following policy for usage of the attributes:

  1. if you have a single boundary date in the sense of notBefore or notAfter, use cei:date/@value to translate it for internal use
  2. if you have two boundary dates in the sense of notBefore and notAfter, use dateRange/@from|@to to translate it to for internal use.

I would finally suggest to get rid of the ambuguity as fast as possible by converting the currently exsiting dateRange-data into notBefore and notAfter by my considerations above. This could be done during the move to Monasterium-NG in the DiDip project.

yngwi commented 1 year ago

I'm not sure why I wrote what I did in 2020 but from my perspective now it would look as if practically speaking @from and @to limit the possible date a charter was issued on if the exact date is not known. @from is used similarly as @notBefore and @to as @notAfter. The main difference I see between the two variants (apart from the actual name) is that in mom, currently from and to are supposed to always be used in combination whereas notBefore/After can usually appear with and without each other to signify that we can know the earliest possible but not the latest possible date. We have usually solved this use case by "forcing" archivists (at least I did when importing things) to choose the most correct approximate start/end date, which usually means writing something like <cei:dateRange @from="08000101" @to="08991231">9th Century</cei:dateRange>. While this is not something like a @notBefore/@notAfter derived from looking at the persons in the charter etc. to find a date but a more general dating, the date is usually manually chosen nonetheless and (I think) usually means the same thing. I'm not aware that we use it in the sense of an actual date range where the charter was issued over a longer time.

yngwi commented 1 year ago

The biggest issue with the current system in my experience was always that editMOM not really uses cei:date but adds a date range where to and from are the same. This is of course wrong from a semantic perspective.

NTsch commented 1 year ago

After speaking with @GVogeler, the least complicated measure for now seems to be to introduce @notBefore and @notAfter as attributes for cei:date, so that this information is not lost, and to pick the best available value from them for @value, leaving the code for ordering charters unchanged. I'll make a PR to that effect, unless there are further thoughts.

GVogeler commented 1 year ago

The biggest issue with the current system in my experience was always that editMOM not really uses cei:date but adds a date range where to and from are the same. This is of course wrong from a semantic perspective.

Yes, in fact, we should replace on the long run dateRange with date allowing all attributes (@from|@to for a period ("in the summer of 1230"), @notBefore|@notAfter ("[1230-1232") for unknown dates and @when for a single day) and build an internal representation for sorting purposes and consider how the requirement of having a sorting value can be managed in EditMOM other than by a template with a preset value.