icaruseu / mom-ca

Monasterium.net (http://www.monasterium.net/mom) - repository and collaborative archive
https://github.com/icaruseu/mom-ca/wiki
GNU General Public License v3.0
17 stars 11 forks source link

Inconsistent cei:date / cei:dateRange data #4

Open GVogeler opened 10 years ago

GVogeler commented 10 years ago

Checken:

GVogeler commented 8 years ago

Es gibt auch 300 Urkunden ganz ohne cei:date/@value oder cei:dateRange/@to:

xquery version "3.0";
declare namespace cei = "http://www.monasterium.net/NS/cei";
declare namespace atom = "http://www.w3.org/2005/Atom" ;

<datierungen xmlns:atom="http://www.w3.org/2005/Atom" xmlns:cei="http://www.monasterium.net/NS/cei">
{for $u in collection('/db/mom-data/metadata.charter.public')//atom:entry
where not($u//cei:issued/cei:date/@value) and not($u//cei:issued/cei:dateRange/@from)
return <urkunde>
    <bestand>{$u/atom:id/replace(.,'tag:www.monasterium.net,2011:/charter/(.*?)/[^/]*?$','$1')}</bestand>
    {$u/atom:id}
    {$u//cei:issued}
    </urkunde>}
</datierungen>
StephanMa commented 8 years ago

I would suggest to use a certain repair-script to clean up our data in case of a missing date-field

The following script works on my local maschine

/db/Stephan/repair-date.xql

It collects all charters without any date-information and calls the update-routine from mycollection:checkUpdateCharter. It inserts any missing field from the charter-template.

Repairscript is installed on the live-environment.

For the future we just have to ensure that all imports fit the requirements.

StephanMa commented 8 years ago

If using a repairscript is recommended, how to deal with the charters, which just have a value in cei:dateRange/@from (eg. 15249999) but non @to-value?

/db/mom-data/metadata.charter.public/BISANU/13800901_byz-Lazar.cei.xml /db/mom-data/metadata.charter.public/BISANU/138801xx-Lazar.cei.xml /db/mom-data/metadata.charter.public/BISANU/1322xxxx_tpq-Stefan_Decanski.cei.xml /db/mom-data/metadata.charter.public/BISANU/132405xx_tpq-Stefan_Decanski.cei.xml /db/mom-data/metadata.charter.public/BISANU/132409xx_tpq-Stefan_Decanski.cei.xml /db/mom-data/metadata.charter.public/BISANU/137905xx_tpq-Lazar.cei.xml /db/mom-data/metadata.charter.public/HU-MFL/Toeroekiratok/15619999_115.cei.xml /db/mom-data/metadata.charter.public/HU-MFL/Toeroekiratok/15689999_77.cei.xml /db/mom-data/metadata.charter.public/DE-AKR/Urkunden/SpAR_Urk_136.cei.xml /db/mom-data/metadata.charter.public/DE-AKR/Urkunden/SpAR_Urk_1477.cei.xml /db/mom-data/metadata.charter.public/DE-AKR/Urkunden/SpAR_Urk_1999.cei.xml /db/mom-data/metadata.charter.public/DE-AKR/Urkunden/SpAR_Urk_630.cei.xml /db/mom-data/metadata.charter.public/DE-AKR/Urkunden/SpAR_Urk_64.cei.xml /db/mom-data/metadata.charter.public/DE-AKR/Urkunden/SpAR_Urk_74.cei.xml /db/mom-data/metadata.charter.public/DE-AKR/Urkunden/SpAR_Urk_93.cei.xml /db/mom-data/metadata.charter.public/DE-AKR/Urkunden/ca12819999.cei.xml /db/mom-data/metadata.charter.public/DE-AKR/Urkunden/SpAR_Urk_119a.cei.xml

and 3 private charters..

yngwi commented 8 years ago

What about just manually repairing these few charters?

StephanMa commented 8 years ago

@yngwi yes also possible... but in which way?

Max-Date in @from doesn't make any sense to me. At least we should at the same date to @to as well

yngwi commented 8 years ago

Usually, the a value like 14809999 is just a lazy easy of saying 14800101 - 14801231. I think we can treat this kind of value in a to or from Attribute just like we would if it would appear in a date@value. I would correct erroneous values in that way that I would change to and from in such a way.

yngwi commented 8 years ago

So, no 9999 in to or from. in my opinion 9999 should be limited to value

yngwi commented 8 years ago

And even there I don't like it. I would preferring to get rid of it in value too, as those are just lazy Date Ranges

GVogeler commented 8 years ago

Users are lazy :-) Wouldn't we then need a mechanisms to help the users entering this information, i.e. automatic suggestions (as e.g. in the Archiveditor or the Personendatenrepositorium)?

yngwi commented 8 years ago

Why not have special input boxes depending on the type of the attribute. xs:date could have for instance one of those date input pop ups.

StephanMa commented 8 years ago

Well i don't know if it is possible just to choose a year in this kind of boxes. Unfortunately sometimes you don't have more Infos then just a year....

yngwi commented 8 years ago

If we really want to enforce clear data entries, I wouldn't make the input of just a year possible. All values need to be full xs:dates, date@value is for charters with a clearly known date, all other things need to be a dateRange with both from and to being valid xs:dates. Everything else is, imho, just teaching users to be sloppy.

For me the only open question in this regard is: what to do with charters would appear in a printed edition with something like sine dato. I would think, that it is preferable to force users to still think of a reasonable dateRange, you can have ranges spanning some hundred years, after all. And most of the time you can at least say if a charter is from the early, high or late middle ages, for example. And this is better than nothing. Imho...

Or am I being unreasonable and too much in-your-face? :)

StephanMa commented 8 years ago

I totally agree with you. If just the date is given, they have the opportunity to choose 1252-01-1 to 1252-12-31.

But its a problem if no date is given. In this case, you have to be an archivist or a historian to solve this issue. Students or pupil aren't able to do such a task.... But this shouldn't happen often.... i hope so...

yngwi commented 8 years ago

Yes, but I think we don't want users that are not able to decide this being able to input anything anyway? There are simpler tasks for them, like the google scan corrections etc.

I think it's not good to compromise here, we have enough questionable data already. Of course, the question is what to do with the existing charters that doesn't have this data...

GVogeler commented 8 years ago
  1. If I'm right, xs:date allows simple year entries, but should be checked against: https://www.w3.org/TR/xmlschema-2/#date. If we go for fully fledged xs:date support we would need our own sorting algorithm as diplomatists sort incomplete date at the end and not in the start. So we are in any case at our self defined format.
  2. The main question is, from my point of view, how to support users with entering the date. Thus, if there is an fast and "intuitive" interface to create the numerical representation, users will use it. If not, they will avoid it. The calendar-box seems to be a good approach then, because it hides the formalisation from the user. But creating a calender box for historical data might be quite a task (are there any students out there willing to write this kind of calendar? :-)) I personally like the idea of fast pattern recognition from words (1356 X 16; [Dezember 1340]; 1245-1258; 934, 16th April ...) and opening the attribute only if the recognition fails to produce sensible data.

Which leads to the main question: what format is most easily entered and can most easily checked and converted? What about: Marking up text with cei:date triggers a attribute-insertion box with one box to enter date. This date is checked against the following rules:

  1. Y?YYY => from="YYYY0101" to="YYYY1231" certainty="year"
  2. Y?YYY-MM => from="YYYYMM01" to="YYYYMM[23][01]" certainty="month"
  3. Y?YYY-MM-DD => from="YYYYMMDD" to="YYYYMMDD" certainty="day"
  4. anything else (e.g. Y?YYY - Y?YYY as "from year to year" or Y?YYY-MM-DD - Y?YYY-MM-DD for "from day to day") => ask user for explicit attribute entering
  5. when no date is given use from="99999999" to="99999999" certainty="none"

Btw, if we approach this really, we should solve #191 together with this!

GVogeler commented 7 years ago

see #510

GVogeler commented 7 years ago

btw: as for automatic date extraction https://github.com/HeidelTime/heideltime could be interesting.

GVogeler commented 3 years ago

see also #994