srophe / syriaca-data

Repository for Syriaca.org TEI data, used by srophe-eXist-app.
4 stars 16 forks source link

Format for Syriaca controlled vocabularies #110

Open nathangibson opened 8 years ago

nathangibson commented 8 years ago

@wsalesky @dlschwartz @davidamichelson So I've been looking into the format we should used for TEI docs that encode controlled vocabularies (confessions, place types, langs, etc.). I have not turned up any good options besides taxonomy and list:

Taxonomy can only be contained in the header (a disadvantage if the document's sole purpose is to define the taxonomy), but I think it has some advantages.

  1. It is more semantically precise.
  2. It makes more sense to grab a catDesc inside a category marked with xml:id than to grab the label before an item marked with an xml:id.
  3. Taxonomy supports multiple languages better, since we can put multiple catDescs with different xml:lang attributes inside a category, whereas list only allows one label per item.
  4. When we reference the controlled vocabulary from the classDecl of our entity records, either a list or a taxonomy could be referenced with a bibl inside a taxonomy tag (yes, this is standard). But if we use taxonomy to define our controlled vocab in another doc, we could just use a copyOf attribute to point to that doc. E.g., <taxonomy xml:id="confessions" copyOf="http://syriaca.org/documentation/confessions.xml#taxonomy"/>

If we do want to include some discussion of the taxonomy in the body of the defining document, we could always refer to the catDesc xml:ids.

What do you think? I'll go ahead and start making some of these controlled vocabs using taxonomy, but I can always convert them if needed.

wsalesky commented 8 years ago

I think taxonomy makes the most sense for our use case. It seems more syntactically correct, and I suspect it may be easier to reuse programatically. I will have to rework the code that renders the confessions, so if you could do confessions first and let me know when it is ready I can work on that.

nathangibson commented 8 years ago

OK, sounds good!

nathangibson commented 8 years ago

OK, @wsalesky , what do you think of this? https://github.com/srophe/srophe-eXist-app/blob/dev/srophe-app/documentation/confessions.xml

wsalesky commented 8 years ago

Looks good to me.

nathangibson commented 8 years ago

Do you think we need an xml:id on the taxonomy tag?

wsalesky commented 8 years ago

No, I think it is fine.

nathangibson commented 8 years ago

Uploaded place types (https://github.com/srophe/srophe-eXist-app/blob/dev/srophe-app/documentation/place-types.xml) which should become the source for generating the page: http://syriaca.org/documentation/place-types.html.

In principle, we could make the taxonomy for place types hierarchical, but I left it flat since we would need Thomas' input before deciding for sure which types are nested inside others.

The types listed on the html page had links to view all places of that type (e.g., http://syriaca.org/geo/browse.html?view=type&type=building). I'm not sure where to put this in the TEI, or whether you will auto-generate this in your XSLT.

Currently, place types are indicated by place/@type. We will need to discuss how to replace this with a pointer to the controlled vocabulary.

Also added a head element to the body of confessions.xml, to provide the header that should be displayed at the top of the page.

wsalesky commented 8 years ago

Thanks @nathangibson!

Don't worry about the links to the browse, that will be handled by the xslt.

-Winona

nathangibson commented 8 years ago

@davidamichelson @wsalesky @dlschwartz As far as languages, unfortunately <langUsage> cannot be empty. This means we cannot simply use @copyOf to point to an external doc with our language definitions. Each <language> tag within langUsage could be empty and use @copyOf to point to our description. But we would still need a language tag in every doc for each language used in that doc. Seems like a difficult thing to maintain.

I can ask the TEI list whether its advisable to point our language definitions to an external doc, but I thought I'd mention this limitation first.

nathangibson commented 8 years ago

@wsalesky , I just took a shot at a controlled vocab for author|editor/@role in work records: https://github.com/srophe/srophe-eXist-app/blob/dev/srophe-app/documentation/author-editor-roles.xml. This is very preliminary -- @davidamichelson and I will need to discuss the categories and definitions there.

Please note the use of the following and preferred visualizations:

As I was writing this, I began to see that we might need more in the controlled vocab than we are doing with category and catDesc. Would you agree that we need the following?

  1. A machine-readable role (e.g., "incorrectly-attributed") to be used as the value of @role.
  2. A human-readable role (e.g., "Incorrectly Attributed") to be displayed in the authorship section of the page.
  3. A human-readable short description (e.g., "A person whom others have erroneously credited at any time with the authorship or editing of a work."). This could be used as a mouse-over hint and/or simply as the main description of the role.
  4. A usage note (e.g., "This role indicates that the Syriaca.org editors consider the attribution to be definitely false. An attribution of uncertain accuracy should use the 'attributed' role ...."). This is the fine print for readers who are going to the documentation page to find out exactly how these terms are/should be used.

catDesc is supposed to be a short description, and can't contain many of the TEI elements that are used for longer text blocks. So with category + catDesc we essentially have nos. 1 & 3 of the above. However, I discovered that category can alternatively contain <gloss> and <desc> instead of catDesc, but they cannot be used if catDesc is being used. These can be used multiple times in a single category, and gloss can have a @type on it. desc is also supposed to be a "brief description," but allows for a lot more inside it than catDesc does.

So what would you think of using something like <gloss type='short'> for the human-readable role, <gloss type='long'> for the description, and <desc> for the usage note? (I mean doing this across the board, not just for the author/editor roles.)

The alternative, if we stay with taxonomy, rather than list (label + item), would be to put certain things like usage notes in the body and link them to the taxonomy categories.

See also: https://github.com/srophe/srophe-eXist-app/issues/540

nathangibson commented 8 years ago

@tacarlson I'm sending a question to the TEI list to gather some more info on this. We'll be interested in your thoughts once you're free. Don't worry, we'll wait for you before making a final decision.