wlpotter / csv-to-srophe

A set of XQuery modules for converting CSV data to Srophe-compliant TEI XML records. Developed for Syriaca.org
GNU General Public License v3.0
1 stars 1 forks source link

Person and subjects templates missing schema associations #44

Closed wlpotter closed 2 years ago

wlpotter commented 2 years ago

@dlschwartz I'm not sure which schemas to add for persons and subjects since we haven't done batch/schema updates for these?

For now, should we use:

<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/master/documentation/schemas/out/syriacaAll.compiled.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/master/documentation/schemas/out/syriacaAll.compiled.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/master/documentation/schemas/uniqueLangHW.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>

Otherwise I could use what is currently in persons and subjects records:

  1. Persons (using person-13 as a model)
<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/master/documentation/schemas/out/syriacaAll.compiled.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
  1. Subjects (using abbasids as a model)
<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/dev/srophe-app/documentation/odd4Taxonomy/out/odd4Taxonomy.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/dev/srophe-app/documentation/odd4Taxonomy/out/odd4Taxonomy.rng" schematypens="http://relaxng.org/ns/structure/1.0"?>

Priority for finalizing persons transform and running test of subjects transform

wlpotter commented 2 years ago

The advantage of copying from existing data is we could be confident that a find-and-replace in files would also catch the data coming from this transform (when we get to batch changes). Although it would likely be easier to have a simple xquery go through and update all the processing instructions to whatever we decide they should be (even if a record doesn't have any processing instructions), so maybe this issue is not actually something to be concerned with?

wlpotter commented 2 years ago

Persons should use the SyriacaALL schema, so should look like:

<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/master/documentation/schemas/out/syriacaAll.compiled.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/master/documentation/schemas/out/syriacaAll.compiled.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/master/documentation/schemas/uniqueLangHW.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>

Taxonomy should use https://raw.githubusercontent.com/srophe/srophe-eXist-app/dev/srophe-app/documentation/odd4Taxonomy/out/odd4Taxonomy.rng so should look like:

<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/dev/srophe-app/documentation/odd4Taxonomy/out/odd4Taxonomy.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/dev/srophe-app/documentation/odd4Taxonomy/out/odd4Taxonomy.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-model href="https://raw.githubusercontent.com/srophe/srophe-eXist-app/master/documentation/schemas/uniqueLangHW.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>

To do:

wlpotter commented 2 years ago

I've updated the templates with schemas and re-run them; I've created an XML dump of the various issues raised by validating the csv transform with the schemas. Not sure how useful these reports are since they contain a mix of outdated or not-entity-specific errors. I may try to cut through some of the noise to see if they're useful.

I did end up adding the persons data to the server since we will likely catch any issues with them during the batch changes, etc. in the near future.

I will leave this issue open to remind me to make a separate issue for updating the templates once the final schemas have been created for those modules.

wlpotter commented 2 years ago

45 and #46 are tracking validation errors from the current schemas. #47 is for updating the templates with the new schemas once those are done.