national-gallery / NG-CIIM

Development of Gallery-instigated CIIM configurations and plugins; not the Gallery's CIIM itself.
0 stars 0 forks source link

Events #19

Open richardofsussex opened 1 year ago

richardofsussex commented 1 year ago

I've had a go at Events today, mapping them to the Linked Art framework. See:

http://richardofsussex.me.uk/ng/output/event-7.json http://richardofsussex.me.uk/ng/output/event-22.json http://richardofsussex.me.uk/ng/output/event-25.json

I have made no attempt to represent the recurring nature of the recurring event (Christmas day), but otherwise the results should be worth checking.

richardofsussex commented 1 year ago

I note that the spec shows an interest in representing agents and places. That will be no problem for Linked Art. Do you have example events for which these fields are populated?

RGShepherd commented 1 year ago

Try event-13 in the live index - I'm hoping we've made the mappings in the CIIM ;-)

richardofsussex commented 1 year ago

As in https://data.ng.ac.uk/es/public/_search?q=@admin.id:event-13 ? I don't see any agent or place data there: image

RGShepherd commented 1 year ago

No, I'm afraid we haven't added them into the CIIM yet. Hey ho. In due course ...

richardofsussex commented 1 year ago

No problem - but that is my window onto your data.

RGShepherd commented 1 year ago

Understood!

RGShepherd commented 1 year ago

All said, though, I do feel that an event without a timespan is perhaps lacking something? 😉

Alas, the ability to include yearless dates in ISO8601:2000 was deprecated in ISO8601:2004, which rather puts the kibosh on anything expect perhaps timespan.identified_by.content for event-22? And I assume Linked Art haven't got as far as ISO8601-2:2019, which looks from a cursory glance as though it would let us use incomplete dates - and define cycles of repetition (though I bet it still can't deal with Easter)..

richardofsussex commented 1 year ago

I was thinking of agents and places as being added to events, rather than have them replace timespans! Having said that, it would be perfectly valid to record that person A was born in place B, even if you have no idea when.

The issue of being able to record yearless dates, cycles of repetition, etc., as Linked Data is clearly a requirement that is common to the whole historical research community (i.e. going beyond just museums). Where should we start a conversation about it?

richardofsussex commented 1 year ago

Events now have the correct Linked Art type "Event", and their dates are given as a timespan array of TimeSpan objects.

RGShepherd commented 1 year ago

I was thinking of agents and places as being added to events, rather than have them replace timespans! Having said that, it would be perfectly valid to record that person A was born in place B, even if you have no idea when.

Sorry - that was just a general comment about the absence of timespans - now fixed: thanks!

RGShepherd commented 1 year ago

The issue of being able to record yearless dates, cycles of repetition, etc., as Linked Data is clearly a requirement that is common to the whole historical research community (i.e. going beyond just museums). Where should we start a conversation about it?

By and large, it's a discussion that people seem to be avoiding. It gets worse when we consider dates in calendars other than the Gregorian, a distinction which by and large people seem to avoid making: https://rupertshepherd.info/documentation/the-problem-with-dates

As to where to discuss it - who knows? Probably too niche for the MCG. But I would hope something which Yale will have wrestled with, given that much of their data will come form the Yale Center for British Art - and Britain notoriously adopted the Gregorian calendar very late.

RGShepherd commented 1 year ago

So I think now the only issues are:

  1. an empty referred_to_by in event-22
  2. do we need to add the classification when generating _label? I'd think not; but if we do, we should remove the second space that's added when concatenating the two values into that field.
  3. related to this: classified_as.classified_as: is this right? Isn't the content here a classification rather than a label/name?
  4. part_of.type (see event-7): perhaps activity rather than HumanMadeObject?
richardofsussex commented 1 year ago
  1. does the odd empty array matter? There are (currently) six different structures which could populate referred_to_by, and I would need to test for the absence of each of them, in order to successfully suppress the array. However, if it matters, I can ...
  2. not sure where I picked up that pattern from, but I can certainly work out where the extra space is coming from (it annoyed me too)
  3. I assumed that "historic event" was simply a descriptive phrase, probably a controlled term. If it's part of a formal classification system, how would I deduce that this is the case, and where would I get the details of that system from?
  4. that's a fair cop. It's a side-effect of trying to generate output from all entity types into the same target structure. I'll see if I can fix it in a suitably generalized manner ...
richardofsussex commented 1 year ago

OK, bullet points 2 and 4 addressed: see http://richardofsussex.me.uk/ng/ciim7-output/event-7.json

RGShepherd commented 1 year ago
  1. Probably one for @jpadfield and Rob, as potential consumers
  2. I think we should drop it, please. Its label should really be its preferred name.
  3. Well, it's a two-term classification applied in the termlist for an 'event type' field in TMS. The problem with TMS, is that you can't tie most terms in termlists to identifiers ... The terms are historic event (for which I can find no useful heading in the AAT hierarchy for events and festival (actually, probably http://vocab.getty.edu/aat/300069349). But we can keep it as a label if that's more correct.
  4. Thanks! I think I'd assumed you were forking the code based on the value of @datatype.base, as you're outputting to different ontologies based on entity type.
richardofsussex commented 1 year ago

OK, the event records no longer have the prefix in their type.

As regards point 4, I wrote the bulk of the code while I was still working with object records. I've tidied up the immediate issue, but there is more tidying-up I need to do. One potential gotcha is where a record of one type has a link to a record of another type.

RGShepherd commented 1 year ago

Re. the gotcha in point 4, we'd be addressing that with https://github.com/national-gallery/NG-CIIM/issues/22#issuecomment-1670921509 (2nd bullet point)

RGShepherd commented 1 year ago

Although the general rule with the CIIM indexes is that objects link to other entities, and not vice versa.

richardofsussex commented 1 year ago

OK, that will be good. I'll not worry about it for now.

jpadfield commented 1 year ago
  1. I do think the empty classified_as and referred_to_by could be a problem - if we do not have data can we just omit them? Do we want to consider the notion of stub nodes ... basically place holders indicating when we know we do not have data ... might make it clearer and easier to identify gaps ..
  2. The use of "inverted terms" here does not really make sense, as the terms have not been inverted ...
  3. Do we have any additional information relating to the difference of the two name? Only one is marked "_label" would we want long/short or alternative, ..... Not needed just wondering.
  4. The Timespan issue could be complicated and again an issue relating to what we do when we have no data - as noted before I did do work a long time ago to parse all of our dates and identify events and start and finishes and overlapping decades and centuries. we could reuse some of this if it helps ....
  5. In general I think we need to be consistent with how we format dates for all entities we express - LA uses the ''1763-04-01T00:00:00Z' type format and uses the other terms as well "end_of_the_beginning" it might be good to do this ... we just need a consistent set of rules to determine a range from a given date description (something like https://research.ng-london.org.uk/wiki/index.php/National_Gallery_Display_Date_Descriptors)
  6. If we really wanted to us "dates" when we do have any we could just use broad ones ....
  7. Again this comes to how we document things - if dates are know you get these data points, if data points are not included it means dates are unknown.
  8. Do we have a list of all of the dates that have values but do not relate to actual numbers ....
richardofsussex commented 1 year ago

"inverted terms" made sense for personal names (agent); not so here as you say. Here are its sibling concepts: image Is any of them a better match for your concept 'sort name'? If not, we could just have a textual label 'sort name'.

RGShepherd commented 1 year ago
  1. None of those; but http://vocab.getty.edu/aat/300451544 seems ideal!
richardofsussex commented 1 year ago

Agreed: done.

RGShepherd commented 1 year ago
  1. to do: please remove empty arrays.
  2. resolved.
  3. to do: in classified_by, let's use our agreed string + uncool URL for now
  4. resolved.
  5. = 1
  6. resolved.
  7. = 7
  8. data here is pretty limited; I'll look at all the values we have. The one problem - as mentioned earlier - is events that are cyclical, which are not amenable to encoding as ISO dates. Then we can look at how best to convert values / strings to CRM-like dates.
  9. = 8
  10. = 8
  11. = 8
  12. = 8
RGShepherd commented 1 year ago

Re. 8, here are all our current event date values:

EventType DateText DateTimeBegin DateTimeEnd Count
festival NULL NULL NULL 1
historic event NULL NULL NULL 2
festival 25 December NULL NULL 1
festival 6 January NULL NULL 1
historic event 9 June 1156 1156-06-09 1156-06-09 1
historic event 1 June 1432 1432-06-01 1432-06-01 1
historic event 1517 1517 NULL 1
historic event 12 February 1554 1554-02-12 1554-02-12 1
historic event 7 October 1571 1571-10-07 1571-10-07 1
historic event 22 October 1642 - 3 September 1651 1642-10-22 1651-09-03 1
historic event 15 May 1648 1648-05-16 1648-05-16 1
historic event 13 June 1665 1665-06-13 1665-06-13 1
historic event 19 April 1691 1691-04-19 1691-04-19 1
historic event 17 May 1756 - 15 February 1763 1756-05-17 1763-02-15 1
historic event 20 April 1792 - 20 November 1815 1792-04-20 1815-11-20 1
historic event 20 September 1792 1792-09-20 1792-09-20 1
historic event 6 November 1792 1792-11-06 1792-11-06 1
historic event 16 May 1795 1795-05-16 1795-05-16 1
historic event 21 October 1805 1805-10-21 1805-10-21 1
historic event 30 October 1813 1813-10-30 1813-10-30 1
historic event 11 February 1814 1814-02-11 1814-02-11 1
historic event 19 June 1867 1867-06-19 1867-06-19 1
historic event 18 March 1871 - 28 May 1871 1871-03-18 1871-05-28 1
historic event 1 April 1877 - 30 April 1877 1877-04-01 1877-04-30 1
(not assigned) 15 THURSDAY 2005-12-15 2005-12-15 1

So by-and-large, I'd say not much date processing needed. Our only real problems are:

richardofsussex commented 1 year ago

I took it from this list that conversion from the DateText values to the DateTimeBegin and DateTimeEnd ones was required, and have done this in the XSLT. However, examining the data I see that all the DateTime values are already recorded in date.from and date.to. Anyway, the code is there should it be needed, and at least it now adds a suitable 'start of day' and 'end of day' Time qualifier to yyyy-mm-dd dates (e.g. event-7).

RGShepherd commented 1 year ago

Yes, I think safest to stick with conversion of the YYYY[-MM-DD] values in the from and to fields, rather than the free text - likely to be less fragile.

The pedant in me (and I think ISO-8601) thinks we should drop the timezone 'Z' as the times should be assumed to be local / predate the adoption of timezones.

Any thoughts on how to handle timespans without defined beginnings / ends?

richardofsussex commented 1 year ago

Timezone dropped (and support for yyyy-mm added - missed that case).

I think what we have now is logical. If only the start of the timespan is known, just record that fact. Timespans don't need to be complete.

RGShepherd commented 1 year ago

Is there an example I can check? Timezone still appearing in e./g. event-7:

"timespan": {
  "type": "TimeSpan",
  "begin_of_the_begin": "1792-11-06T00:00:00Z",
  "end_of_the_end": "1792-11-06T23:59:59Z",
  "_label": "6 November 1792"
},
richardofsussex commented 1 year ago

Try now - I failed to actually re-process the events! 7 and 25 should be OK now.

RGShepherd commented 1 year ago

Thanks! please could you also process event-9, so I can double-check an entry with different start and end dates? Looking good so far ...

richardofsussex commented 1 year ago

Done: http://richardofsussex.me.uk/ng/ciim7-output/event-9.json