wlpotter / csv-to-srophe

A set of XQuery modules for converting CSV data to Srophe-compliant TEI XML records. Developed for Syriaca.org
GNU General Public License v3.0
1 stars 1 forks source link

First draft of subjects transform #39

Closed wlpotter closed 2 years ago

wlpotter commented 2 years ago

@dlschwartz

I know there are a few things we still need to finalize, but I went ahead and output a first draft of the taxonomy from the CSV. Here are those records: https://github.com/wlpotter/csv-to-srophe/tree/main/test/out/csv-tests/subjects

The following have good coverage of the various data we would want to check (though you may want to spot-check others as needed)

A few issues I've noticed:

  1. Some elements have an xmlns declaration for SKOS. I think I've figured out where that's coming from, but need to do a bit more testing.
  2. Note that all SNAP relationships are currently skos:closeMatch. Per #35 I've made two columns for SNAP (one closeMatch, one broadMatch), but you will need to go through and determine which SNAP relationships should be in which columns.
  3. We have xml:ids and a type on the tei:note elements which I don't think should be there. But the xml:id on the headword should stay. Is that right? (cf. https://syriaca.org/keyword/alliance-with/tei)

Let me know if you see any other bugs.

wlpotter commented 2 years ago

The namespace issue was more complicated than I thought...see #40

wlpotter commented 2 years ago

One thing I noticed is that the script is prepending "http://syriaca.org/keyword/" to //relation/@passive attributes for snap relations. I think this is because I had it set to do this unless it had a full URI already (this let us use just the numerical portion of URIs for the persons transform).

I will likely turn this into a separate issue as it may require some rethinking of the code.

wlpotter commented 2 years ago

I've made #41 for the entity base URI issue.

The note xml:ids and types have been removed in ab2500188024c196c490cfb69fee1083aaa02577

All these should be fixed; open new issues as other bugs arise.