msDesc / consolidated-tei-schema

TEI Manuscript Description ODD Customisation
https://raw.githubusercontent.com/msdesc/consolidated-tei-schema/master/msdesc.rng
BSD 2-Clause "Simplified" License
16 stars 7 forks source link

improve procedures for authority management? #6

Open holfordm opened 6 years ago

holfordm commented 6 years ago

Currently authority control is done through three large files for persons, places, and works respectively. This has worked well for the medieval project which has had a centralized cataloguing structure. As we move to a more decentralized structure, first with Fihrist, later with expanding medieval to the Oxford colleges and possibly Cambridge, the use of single large files is likely to be problematic. It is likely that multiple editors will make changes to the files simultaneously, resulting in complex conflicts and general frustration.

If we agree that this is an issue, I can think of two potential solutions.

  1. (the easiest). split the files into individual files, one for each entity, retaining the current identifiers. This would greatly reduce the likelihood of conflicts and would make any that did arise much easier to resolve. Changes to the existing indexing processes should not be that great?
  2. move to a dedicated authority management system. I have used EATS https://github.com/ajenhl/eats in the past which worked well but might need a lot of customisation to be suitable for our projects. It might also be possible to set something up using eXist?
andrew-morrison commented 6 years ago

As discussed this morning, I've been experimenting with using some XML technologies to get some of the benefits of a dedicated authority management system, without the delay that selecting and setting one up would cause:

What I've added is:

I've copied your persons.xml authority file into persons1.xml in a subfolder called 'bodleian' and created a new persons2.xml with a single entry - a deliberate duplicate for demonstration purposes - so you can open either file in Oxygen, validate, and it will take you to the duplicate. These files can be renamed to whatever you want, just update the href attributes in persons_master.xml file.

Limitations/issues:

@holfordm: If you have time to try this out, let me know how it well it works for you. I can change what it displayed in the preview easily (e.g. add another column for birth year?)

If you want to start using this, let me know because there are a few extra steps. It only works with people authority files at the moment, but it is trivial to set up analogous code for places, organisations and works. And the indexing scripts would need to be pointed to the new 'master' authority files.

andrew-morrison commented 6 years ago

@eifionjones: I probably should have tagged you in on this issue earlier, but it has taken me a while to get my head around the issues.

As Matthew says above, authority files for things like works and people are going to be a nightmare if lots of people are going to need to update them, potentially all at the same.

So I have developed an experimental system for building one authority list out of multiple individual files, helping people to avoid clashing IDs, and provide a user interface for viewing existing entries. I've now set this up in the fihrist-mss repository. It's just for demonstration purposes at the moment, and only works for person authority lists, but if you have time, it would be good to get your feedback.

If you update your local copy, then open _authority/personsmaster.xml in a web browser (Firefox, Safari or Internet Explorer) you can see 18 people I've copied out of Medieval's authority list. But none of them are actually in that file. Instead they are imported from three separate persons.xml files, each in a separate subfolder. Open either the Oxford or Cambridge file in Oxygen, validate it, and it will find the deliberate duplicate I have added. Switch back to the web browser and paste the ID into the search box, and it'll show you both.

Is this likely to be useful for Fihrist?

eifionjones commented 6 years ago

This does look good! I can see this being useful, we'll just have to get our heads around how people will manage authorities in Fihrist. I had somehow envisaged the authority files being generated from the data set (with maybe a Fihrist lookup) with no need for manual editing/intervention. And a separate index of names and identifiers only for people not in Fihrist. Anyway, let me revisit this and get back to you.