linked-art / linked.art

Development of a specification for linked data in museums, using existing ontologies and frameworks to build usable, understandable APIs
https://linked.art/
Other
93 stars 14 forks source link

Document activity partitioning model for roles #51

Closed azaroth42 closed 7 years ago

azaroth42 commented 7 years ago

And update technique documentation to cover this.

(being that if an individual had a particular role, then there is an activity which is part of the production activity, which is classified according to the technique or role)

https://linked-art.slackarchive.io/general/page-13/ts-1496243675673191

Conal-Tuohy commented 7 years ago

... rather than the several actors each participating in a single activity, with their roles distinguished using P14.1 in the role of.

These CRM "metaproperties" (i.e. properties of properties, such as P14.1 in the role of) are a bit of a bastard in RDF; the standard practice is to express a metaproperty as an rdfs:subPropertyOf of the "parent" property, expressing the typology (such as the variety of productive roles) as distinct properties in the ontology. This is inconsistent with the general approach elsewhere in CRM of using thesauri for classifying entities into distinct types, using CRM's generic P2 has type property to link an entity to its type.

Whereas by dividing a collaborative activity into multiple activities, each with only one actor (or at least, a set of actors all playing the same role), you are able to use the P2 has type mechanism to classify the activity entities.

I think this is a very reasonable approach, but if that's the recommendation then I think it's also worth mentioning the "metaproperties" alternative in the documentation, in order to explicitly deprecate it.

azaroth42 commented 7 years ago

Yes -- RDF (and hence Linked Open Data) doesn't support relationships on relationships, as there's no concept of a relationship instance... resulting in the weirdness you mention. Agree that the documentation should explicitly mention them as not the way to do this.

Conal-Tuohy commented 7 years ago

NB there are a bunch of other sub-typing properties in the CRM:

Property Property subtype property
P3 has note P3.1 has type
P14 carried out by P14.1 in the role of
P16 used specific object P16.1 mode of use
P19 was intended use of P19.1 mode of use
P62 depicts P62.1 mode of depiction
P67 refers to P67.1 has type
P69 has association with P69.1 has type
P102 has title P102.1 has type
P107 has current or former member P107.1 kind of member
P130 shows features of P130.1 kind of similarity
P136 was based on P136.1 in the taxonomic role
P137 exemplifies P137.1 in the taxonomic role
P138 represents P138.1 mode of representation
P139 has alternative form P139.1 has type
P144 joined with P144.1 kind of member

Are any of these also issues for LinkedArt? Can the same pattern always be applied as you suggest here?

e.g. in the case of P62 depicts, which has a subtype P62.1 mode of depiction, the P62 depicts property is a shortcut for P65 shows visual item whose value is an instance of E36 Visual Item and which P138 represents the depicted entity. So adopting the pattern you've proposed here for activities, the instance of E36 Visual Item could be typed (using P2 has type), obviating the need for a metaproperty.

azaroth42 commented 7 years ago

azaroth42 marks this issue as being discussed in slack

azaroth42 commented 7 years ago

Split the .1 issue out separately from this one to #55 so we can consider them as a whole.

azaroth42 commented 7 years ago

This is done here: http://linked.art/model/provenance/#multiple-artists-with-roles
Closing.

natuk commented 6 years ago

Following a discussion with @workergnome last week I have two questions about this approach:

  1. I have a musical performance with two musicians on stage who both know how to play the piano and the guitar and I want to specify who plays what in this instance. To use this approach I need to break down the performance event into multiple performance sub-events (one for each player) and give separate identifiers to what one would expect to be one thing. Imagine scaling this to a symphonic orchestra. This is a true example which came up after discussions in OeRC.
  2. I think, not using the P14.1 property means that you are not actually modelling the role of the actor at all. A reasoner would not be able to discover the role through sub-events. Isn't this the reason why P14.1 is there? Forgive any misunderstandings, I have only now begun to read your documentation.
azaroth42 commented 6 years ago

Re 1. Yes, but you would need to do that anyway. The same player might play different instruments, even within the same performance let alone within the same group over time.

Re 2. The .1 properties are impossible in RDF.

natuk commented 6 years ago

Yes, you are right, I can see the logic for this. But the semantics are lost. RDF reification also has weak semantics. I am just wondering if a subclass of Activity (e.g. Sole Activity with two new 1:1 properties) would help solve the problem (I think similar to how PROV qualified terms do it) but also maintain the semantics.

workergnome commented 6 years ago

Hiya, @natuk. What semantics do you feel are being lost?

You could certainly solve the problem by creating subclasses of Activity for each possible refinement of an activity. either Performance, with a property for the instrument and a property to describe the part, or, more specifically, PianoPerformance and GuitarPerformance, where the instrument is explicitly associated with the Class. You could even get more specific, with a class of PianoPerformanceOfBethovensSeventh, which is semantically rich.

What you lose, in the absence of a rich inferencing engine, is the generalizability of retrieving that information--you also require the creator of the information to create a very detailed set of documentation the describes the precise semantics of each of those activities--and those semantics are often rich enough that they're no longer applicable to any other domain, or even any other performance! I think of this as the "TEI Problem". TEI is powerful enough to allow a scholar to describe in very precise detail the exact details of both their manuscript and their process for interpreting that manuscript, but that descriptive power means that it's almost impossible for any reuse of tooling around TEI documents in general. The solution there, which has influenced our thinking, is to define specific patterns and subsets of TEI and refer to them as "profiles of TEI". Linked Art intends to do something similar for CIDOC-CRM.

Yes, this means that our results are often more verbose than strictly required. And it means that we assume a lot of nuance is carried by hierarchies from which we pull our terms, rather than in the explicit RDF of the description. But my feeling is that these documents are designed to be read by computers, who prefer explicit patterns, and we are better at dealing with large amounts of data (rather than large amounts of code and more concise data.)

natuk commented 6 years ago

Yes, I agree, it is a complex problem. When I say "the semantics are lost" I mean that if you use the E7 Activity class for sub-events and you assign only one actor to them, you imply in the structure that any typed role applies to that actor, but this is not made explicit. Or is it? The P14 property on its own is not enough to express the role. A CRM compatible reasoner would not pick it up. You are right, what I am suggesting is towards the direction of the "TEI problem" and would not work with standard CRM queries either, but at least the role would be explicitly expressed. But maybe this is not so much of a gain. Sorry, just trying to fully understand the issue and your approach.

azaroth42 commented 6 years ago

Have you looked at this section: http://linked.art/model/provenance/production.html#multiple-artists-with-roles ?

The role is expressed as a P2_has_type / classified_as on the Activity, as carried out by the actor.

natuk commented 6 years ago

Yes, so a CRM reasoner would be able to identify the various AAT classes linked to activities, but it would have no information about the role of the actor. It would not be aware of the bit of logic which says that "a type (and I am not sure which one, if I have many) of a sub-activity with only one actor indicates the type of the actor's role". Perhaps this is not an issue for you, as you can build that logic in your application, but this is where I think the semantics of the data disappears.

Conal-Tuohy commented 6 years ago

The pattern which Linked Art uses (typing the sub-activity, rather than sub-typing the actor's role in that activity) seems desirable to me. In the CRM, taxonomies of all kinds are consistently modelled as external vocabularies (thesauri), in which each taxon is an instance of E55 Type, and is assigned to individuals using P2 has type (or to a subproperty using P14.1 in the role of). I think maintaining this consistency is convenient and useful.

The problem for us in the Linked Data world is that properties of properties, such as P14.1 in the role of, are simply impossible in the RDF metamodel. Therefore, if we wish to continue the CRM's "external vocabulary" pattern to classify roles in activities, we are forced to partition the activities into sub-activities (in which all actors must play the same role), and apply distinct types to those sub-activities. But as @natuk points out this leaves each actor playing the same generic role, rather than a specifically-typed role.

The CRM working group do also suggest the alternative pattern can be used in RDF, in which you define a taxonomy of properties using rdfs:subPropertyOf, which has the advantage that a reasoner can recognise e.g. "violin-playing" as a variety of "musical-instrument-playing", but which has the disadvantage of not conforming to the general CRM pattern of using external vocabularies.

However, I wonder if @natuk's point could be addressed by asserting some OWL axioms to effectively re-express a model that uses Linked Art's pattern as an equivalent model using the rdfs:subPropertyOf pattern (from the point of view of a Linked Art client equipped with OWL reasoning)? For an example of what I'm talking about ('rolification'), see particularly the "Case 2b" in this paper. http://daselab.cs.wright.edu/pub2/owled15-types.pdf

workergnome commented 6 years ago

I suppose the question is if there is a significant semantic distinction between "a person who participated in the activity of sculpting" and "a person who participated in an activity as a sculptor". To me, these are functionally equivalent statements--one puts the type on the activity, the other on the person, but they express roughly the same meaning. It similar to dates, where you can talk about the botb/bote/eotb/eote or you can talk about the possible interval and definite interval, but they're expressing the same concept.

I prefer the type on the activity, because while the type applies completely to the event, if it is applied to the person, it then still needs to be associated with an event, because the role is time-boxed. (I also like it because it doesn't use either the .1 properties (or their reified CRMpc equivalents, which are basically unfindable on the internet).

I'd be happy to hear an argument that explains why those are different statements, though. It does make it difficult to talk about "roles", per se, but I think roles are really a gloss on the concept "participation on specific types of events". As a role, a sculptor is only a sculptor when he is sculpting. (Identity and labels are a different concept.)

natuk commented 6 years ago

@Conal-Tuohy described the issue better than me. Regarding @workergnome's question:

I suppose the question is if there is a significant semantic distinction between "a person who participated in the activity of sculpting" and "a person who participated in an activity as a sculptor". It depends on the context:

  • There is a difference if more than one people participated in the activity in different roles e.g. a "master painter" and the "assistants" all contributed to the one "painting" activity.
  • There is no difference if we have two activities "master painter painting" and "assistant painting", which is what is proposed here, presumably joined with P9 consists of.

In the second case we must have only one actor contributing to the Activity. The CRM defines the quantification (used for semantic clarity) of P14 as: "many to many, necessary (1,n:0,n)" not as "one to one (1,1:1,1)" [or "many to one (1,1:0,n)"], which I think is what the sub-event approach requires. So there is divergence in the semantics. However, in comparison to the problems introduced by reification, I think that this divergence is a minor issue. I think the verbosity of the approach would be more of an issue for most people.

The "rolification" example in the quoted paper is interesting and with a bit of imagination I can see how it could work with defining sub-properties locally. Thank you for your thoughts.

azaroth42 commented 6 years ago

There is a difference if more than one people participated in the activity in different roles e.g. a "master painter" and the "assistants" all contributed to the one "painting" activity.

An Activity can be carried out by multiple actors. There's no problem with this.

I disagree that sub-events require a 1:1 or *:1 predicate. We can simply use it in that situation with one, and with multiple in other situations.

Anyone concerned with verbosity is going to have a bad day with RDF in general, and CRM especially.

Conal-Tuohy commented 6 years ago

Here is another idea for representing situations in which multiple people play different roles in a single activity: rather than split the activity into sub-activities each implicitly with one role, you could bundle the people into Groups, each implicitly with one role.

In the case of the National Museum of Australia, we have data in which people are listed as producers of an object, with a "role" which is merely a textual label (such as "sculptor"). This is awkward for us since it's not a good label for an activity ("sculpture" would be the appropriate label).

If however we were to say that the Production was carried out by a group (whose members were the actual sculptors), then we could indeed label the group with the term "sculptor". This would be a case of using a Group to represent a kind of role; a similar usage of Group is foreseen in the scope notes: http://www.cidoc-crm.org/Entity/e74-group/version-6.2

..."In the wider sense this class also comprises official positions which used to be regarded in certain contexts as one actor, independent of the current holder of the office, such as the president of a country. In such cases, it may happen that the Group never had more than one member. " (emphasis mine)

While these different production roles aren't quite the same as official positions such as president of a country, they are not too dissimilar. What do others think? Is this a better approach than sub-activities?

natuk commented 6 years ago

This is perhaps more appropriate, in that the role is assigned to an Actor as opposed to the Activity. However, there may still be a semantic divergence since the Group requires a formation event. The "presidents" group is formed when the constitution is signed and the first president is elected. I do not think we can claim that a Group is formed when a sculptor starts work on a sculpture. I still think that a sub-class of Activity, say "Sole-activity", is semantically more correct, as it can explicitly specify Activities with only one actor and a separate property to specify the role.

P.S. Having said that, I am still using the PC properties in my mappings.

workergnome commented 6 years ago

@Conal-Tuohy, I think the issue with the Group pattern is that if you want to associate any temporal or location based information with the activity, you're going to have to end up with sub-events anyway--the group cannot have a location or a timespan directly associated with it. You'd end up having to do that with joining/leaving events, which is even more awkward.

It sounds like the issue you're running into is that your data is in the wrong Part of Speech form. I think what we're looking for in that case is a mechanism to help with alignment between these representations:

role activity object type
sculptor sculpting sculptures
painter painting paintings
draftsman drawing drawings

or something like that.