wlpotter / csv-to-srophe

A set of XQuery modules for converting CSV data to Srophe-compliant TEI XML records. Developed for Syriaca.org
GNU General Public License v3.0
1 stars 1 forks source link

Determine SNAP relationship output in taxonomy TEI and its associated column names #35

Open wlpotter opened 2 years ago

wlpotter commented 2 years ago

@dlschwartz Could you let me know how you want the SNAP relationships to appear in the TEI along with their crosswalk to Syriaca URIs?

Here is what you said about these columns in #6, for reference:

Column K is a bit of a relic. There was a time when SPEAR was using namespaced relationship types but we've switched over to Syriaca URIs instead. Column K still serves a purpose, however. It offers a crosswalk between Syriaca URIs and equivalent snap relationships. Whatever column heading works for that purpose would be fine.

I think we also need an additional column here. We have relationships that are more precise than snap relationships. For example, we get fairly fine-grained regarding clerical relationships: bishop-over-clergy, fellow-monastic, etc.. In a LOD environment, we will want to serve these up our more specific relationship to snap as their rather generic "professional relationship." I haven't worked out how to render this in TEI but I should probably do that and create a column for that purpose.

dlschwartz commented 2 years ago

@wlpotter I think this issue is coming together with srophe issue #930. I think what we want for this is for column K to be a skos:closeMatch or skos:exactMatch on a <relation> element. See issue 930 for discussion of attributes. In short, however, this should become just another crosswalk column like the current columns H and I.

That said, let me know what you think of this. I think all we want to assert for the crosswalk to LOC, for example, is a skos:closeMatch. In the case of SNAP we mostly want to assert skos:closeMatch [or maybe skos:exactMatch, I probably need to do a bit more work on this]. In the cases discussed in srophe issue #930, we would want to assert skos:broadMatch. What is your preference on the transform side of things:

  1. a column for SNAP concepts for which there is a skos:closeMatch and another column for SNAP concepts for which there is a skos:broadMatch?
  2. one column for the SNAP crosswalk that would have # separated values (like in the persons spreadsheet for PRLE) that would look something like: "skos:broadMatch#snap:professionalRelationship"?

Let's discuss this. Thanks Will.

wlpotter commented 2 years ago

@dlschwartz We can discuss this, but my first instinct is to say we should have a separate column for each relation 'type'. I could also see an argument for keeping the columns as they are and just renaming. The main benefit is we wouldn't have to do any reorganizing.

So, starting in column H you'd have (for LOC, DNB, ISO Lang Code, SNAP)

relation1.skosCloseMatch | relation2.skosCloseMatch | relation3.skosExactMatch | relation4.skosBroadMatch

The types of relations might change depending on what we decide they should be. You could keep a second row or a comment on these columns to remind encoders which URIs to put in which columns.

The only other change here would be to start the enumeration of columns AF-AK at relation5. (These relations may also change type based on decisions in srophe issue #930)

wlpotter commented 2 years ago

Ah, sorry, I missed the point about needing both skos:closeMatch and skos:broadMatch for SNAP relations. I think we should have a column for each.

So instead of just relation4.skosBroadMatch we would have relation4.skosCloseMatch | relation5.skosBroadMatch. Both relations 4 and 5 could be used for SNAP relations as needed.

dlschwartz commented 2 years ago

@wlpotter I think this sounds good but let's chat about it. Thanks.

dlschwartz commented 2 years ago

@wlpotter I've had a chance to read up a bit more and now I'm following the W3C definitions and what you've written here a bit better. To summarize:

I don't think we should use skos:narrower because I think it is easier to list the "parent/s" of the concepts in each record rather than to list all of the "children" concepts in the parent record. Moreover, these are transitive: https://www.w3.org/TR/skos-primer/#secrel according to the SKOS model. Encoding them in a tei:relation with @active and @passive attributes we will be able to query for either.

Unfortunately, the nesting into something resembling a tree is not automatic, see https://www.w3.org/TR/skos-primer/#sectransitivebroader. Notice there that a "grandparent/grandchild" relationship can be inferred as a skos:broaderTransitive. In an RDF environment I think this should mean that we can query for things like all the descendants of a concept or all the children of a parent concept.

As we work on developing the ontology, we might need to tweak this. At the moment though, I think this is where we should start. Any thoughts?

If we go with this approach, I believe that we would do the following in the spreadsheet:

Btw, a lot of this comes out of https://github.com/srophe/srophe-app-data/issues/930 but I think the discussion belongs here.

dlschwartz commented 2 years ago

@wlpotter I suppose this is the right place to deal with @ref vs. @name. My inclination is to do both but I'm not sure that's right. See: https://www.w3.org/TR/skos-reference/#broader. <relation name="skos:broader" ref="http://www.w3.org/2004/02/skos/core#broader"

wlpotter commented 2 years ago

@dlschwartz This all sounds good.

I think using skos:broader and sticking to it makes sense as it and skos:narrower are inverses.

The lack of transitivity does pose some problems. We could use skos:broaderTransitive, and I think that means we would double up relations:

A skos:broader B; skos:broaderTransitive B. 
B skos:broader C; skos:broaderTransitive C.

This would allow the broader link between A and C. For the spreadsheet, we could implement some way to flag if we want to include a skos:broaderTransitive relation -- maybe a relationN.isTransitive column with a boolean flag.

Maybe an alternative would be to explicitly declare skos:broader for each level of relationship, though depending on the depth of the tree this could be even more tedious.


I think the column changes sound good. I will make the adjustments to how the script outputs the relation elements.

For @ref vs @name I agree that we should do both (it's no trouble from a script perspective). The worst case is that one of the attributes is superfluous -- better than losing important data. I will make these script changes and try a few test outputs for you to review.

wlpotter commented 2 years ago

@dlschwartz changing the encoding of SNAP from idno[@type="SPEAR] to skos:broadMatch or skos:closeMatch tei:relation elements raises a question for column G. Previously this column was included as an @ana attribute on the tei:idno. Should we include this as an @ana on the tei:relation element instead? We could also use a @type and/or @subtype for this?

Also, as we now have closeMatch and broadMatch, we may need two columns for this designation as "directed" or "mutual"

dlschwartz commented 2 years ago

@wlpotter actually, I'm not sure we need this at all. I think it's enough that we have a relationship between our concept and the SNAP concept.

But maybe we should discuss this further. From the perspective of a triple store and of an API sharing data with SNAP, maybe it's best to clearly mark when our concept relates to a SNAP concept. Let's discuss this when we meet this afternoon.

wlpotter commented 2 years ago

@dlschwartz sounds good, let's discuss just this issue to make sure we're on the same page. I believe it may be related to #37 as you mentioned in this comment that

the only use of [column G] information is in SPEAR. I have an xslt that transforms the taxonomy into an index. I use the data here to validate that some relationships get a @mutual attribute while others get @active/@passive.

dlschwartz commented 2 years ago

@wlpotter alright, I'm seeing now that I've got myself in a bind between "browse by" categories and the structured hierarchy of an ontology. I need to re-think some things. It might be easiest just to discuss this afternoon in our meeting.

dlschwartz commented 2 years ago

It might be as simple as putting "browse by" categories as a @subtype on tei:entryFree and using tei:relation elements for the structured hierarchy of the ontology. But let's discuss.

dlschwartz commented 2 years ago

@wlpotter I've been working on the taxonomy relationships. I've grouped them in rows 1049-1132 in the spreadsheet.

Columns K and L should not contain an accurate crosswalk with SNAP. Column K is used only for skos:closeMatch and column L contains skos:broadMatch when there is no skos:closeMatch, i.e. it indicates the narrowest concept in SNAP under which our concept falls. This should allow us to share data with SNAP even when we have relationships they don't have.

Columns AG and AH contain one or more parent concepts for each relationship. They should accurately reflect this SNAP graph minus concepts for which we haven't created a keyword and with our concept keywords added in.

I have a question about the difference between "Link" and "Bond" which leaves me less than clear about where to put things like relationships between events or between persons and objects. I think these are a "Link" while relationships between persons are a "Bond" but I'm not sure about that. Let's not close this issue until I figure that out.

dlschwartz commented 2 years ago

Correction: Columns K and L should NOW contain an accurate crosswalk with SNAP.

wlpotter commented 2 years ago

@dlschwartz these look great! I will have the script output them as follows:

<relation name="skos:closeMatch" ref="http://www.w3.org/2004/02/skos/core#closeMatch" active="http://syriaca.org/keyword/adopted-family-relationship" passive="snap:AdoptedFamilyRelationship"/>

or

<relation name="skos:broadMatch" ref="http://www.w3.org/2004/02/skos/core#broadMatch" active="http://syriaca.org/keyword/alleged-relationship" passive="snap:QualifierRelationship"/>

This raises one question: the @passive values contain the "snap" namespace prefix, but we don't have this prefix bound anywhere in our data. We could declare the SNAP namespace on the root TEI element, but I'm not sure if attribute values are within the scope of those declarations? Another option would be to find and replace "snap:" with the namespace URI (e.g., "http://data.snapdrgn.net/ontology/snap#AdoptedFamilyRelationship").

(Note that we run into a similar issue with //entryFree/@type which is currently "skos:concept" for most keywords -- is the skos prefix able to be dereferenced as an attribute value?

dlschwartz commented 2 years ago

@wlpotter, thanks for the question. I think there are two separate issues here.

Does this all make sense?

wlpotter commented 2 years ago

@dlschwartz yes, I think you're right that the two issues are separate, and I was mostly thinking about the second, LOD issue even though I was perhaps putting it in terms of namespaces.

My concern with only declaring the human-readable is that without some external reference table, these attribute values aren't really machine readable (or at the very least wouldn't be useful as machine-actionable data). Perhaps that's not probable enough to warrant concern though?

From a technical standpoint it would be simple to implement the conversion from snap:x to full URI at the transform level using a simple replace function.

dlschwartz commented 2 years ago

@wlpotter Let's talk through what makes most sense tomorrow. Thanks.

wlpotter commented 2 years ago

We will leave the "snap:" in the @passive attribute.

Change "skos:concept" to the full URI (maybe open separate issue?)

wlpotter commented 2 years ago

For column G, add to TEI like this: <note type="relationshipType" subtype="mutual"/>

FYI, #42 is the issue for changing skos:Concept to the full URI

wlpotter commented 2 years ago

I have updated the tei:relation generation to match the comment above. I have also added the note for relationship types to the transform. I will run a new test output to double check, but then I believe this issue can be closed.

wlpotter commented 2 years ago

@dlschwartz when you get a chance, could you take a look at the files from this commit, especially the ones that are relationships with snap close/broad matches? They should be ready to go except for the schemas (on which see #44). The files are also here