uchicago-library / mlc_ucla

GNU General Public License v3.0
0 stars 0 forks source link

Primary and Alternate Titles #49

Open svirgilgood opened 1 month ago

svirgilgood commented 1 month ago

Titles

I have looked at the data regarding titles in the database, and there are a few issues recording titles that I would like to document and then provide a suggestions on how we should move forward with creating titles for the items.

Some of these issues may also exist for collections. But thankfully, there are fewer collections.

Number of Titles

I have constrained the predicate dc:title to have only one value.

:_DcTitleProperty
  sh:datatype xsd:string ;
  sh:maxCount "1"^^xsd:integer ;
  sh:message "dc:titles are required." ;
  sh:minCount "1"^^xsd:integer ;
  sh:name "Title" ;
  sh:path dc:title ;
  .

I am not currently constraining the alternative titles and the dcterms:title. I think these should be constrained where the dcterms:title has to agree with the dc:title, and the dcterms:alternative should not match the dcterms:title.

For OCHRE, it would be preferable to have a singular, unique title that we could create, and use the other forms of
the titles as alternatives. I don't know if there is a need to create a heuristic for defining what a primary title is, there are only a 111 titles to worry about.

But some of them are problematic.

itemId Title
14360  Balochi7-edit
14360  Baluchi7-edit

Or:

ItemID Title
18091  Student book 3: Lessons 81-91; Review of Units 12-14
18091  Program 3, Tape 4

I don't know which of these would be primary titles. But I do think concatenating them is less helpful.

How many dc:title triples should we allow in the EDM triples?

How Titles are currently derived

Item titles are currently pulled from the ItemTitle table with following query:

SELECT     DISTINCT title
FROM       ItemTitle
WHERE      __kp_ItemID = ?
ORDER BY   CAST(ItemTitle.__ItemTitleID AS INTEGER)

query derived from here query populated with this input and and

The same query is used to populate dc:title, dcterms:alternative, and dcterms:title.

This is the current way the titles are derived, some changes should be applied from the above discussion.

Problems with Primary and Alternate Titles

There are 201 alternate titles:

SELECT DISTINCT it.title
FROM ItemTitle it
WHERE it.type = "Alternate"

Several have more than one "Alternate" Title, but not if you filter by distinct titles.

50 items have two distinct primary titles: "10167" "10168" "10169" "10170" "10171" "10181" "10182" "10183" "10184" "10185" "10428" "10806" "11942" "12576" "12594" "13066" "13430" "14360" "14361" "15046" "15047" "15174" "15705" "17462" "17994" "18091" "18273" "19340" "19465" "20899" "21655" "22093" "22178" "22179" "22181" "24183" "24184" "24185" "24186" "24187" "24188" "24189" "5720" "5721" "5722" "5723" "6945" "6947" "6954" "6973"

Problems with No Titles

There are 22 items that don't have a title:

"29503" "29767" "29899" "37457" "37503" "38496" "38546" "38558" "38571" "38576" "38586" "38626" "38776" "39170" "39417" "39518" "39622" "40534" "40535" "40537" "40563" "40570"

These don't have an entry in ItemTitles either.

Two items have a Item.Title_t but not a Item.ItemTitle_list "40548" Doesn't have an ARKID "33618" Has an ARKID: ark:61001/b28d1mz4g50b But these are also not in the ItemTitles table.

select i."__kp_ItemID" , i.Title_t , i.ItemTitle_list
FROM Item i
/*WHERE i.ItemTitle_list = "None" OR i.Title_t = "None" ; */
WHERE i."__kp_ItemID" NOT IN (
    SELECT it."__kp_ItemID"
    FROM ItemTitle it
)

Do these items that don't have titles matter? Are these part of the list of records that are filtered out?

Suggestions

  1. Every Item should have only one Primary Title. We could create a json object of primary and alternate titles based on sql query that is then updated manually for the 100 or so items that need to be corrected. This object would then be used as the source of truth for titles that are imported into OCHRE

  2. In order to make sure that titles are unique, I would suggest item titles are created as Primary Title (itemId).

c-blair commented 1 month ago

"I am not currently constraining the alternative titles and the dcterms:title. I think these should be constrained where the cterms:title has to agree with the dc:title, and the dcterms:alternative should not match the dcterms:title." Agreed.

"I don't know which of these would be primary titles. But I do think concatenating them is less helpful." Agreed.

"How many dc:title triples should we allow in the EDM triples?" It's a repeating field in DC. In EDM the maximum is unbounded.

"The same query is used to populate dc:title, dcterms:alternative, and dcterms:title. This is the current way the titles are derived, some changes should be applied from the above discussion." Agreed.

"For OCHRE, it would be preferable to have a singular, unique title that we could create, and use the other forms of the titles as alternatives. I don't know if there is a need to create a heuristic for defining what a primary title is, there are only a 111 titles to worry about." Agreed.

"But some of them are problematic.

itemId Title 14360 Balochi7-edit 14360 Baluchi7-edit Or:

ItemID Title 18091 Student book 3: Lessons 81-91; Review of Units 12-14 18091 Program 3, Tape 4"

I would put these in dc:description (mapping that to dcterms:description as you describe above for dcterms:title). If that leaves us without a title, use a controlled vocabulary word from DC kernel metadata. Perhaps for this use "(:unas)" for "unassigned". EDM stipulates dc:title or dc:description (or both), but the lack of a dc:title for EDM is not problematic. Also, because I am finicky about such things, if any of the above goes into output, please change sh:message "dc:titles are required." ; to sh:message "dc:title is required." ; Thanks.

svirgilgood commented 1 month ago

"How many dc:title triples should we allow in the EDM triples?" It's a repeating field in DC. In EDM the maximum is unbounded.

I am not sure I understand. I think we should only have one dc:title. Though looking at the SHACL shape, I said a ProvidedCHO can have either a dc:title or a dc:description (or both). (or a dc:subject, dcterms:spatial, or dcterms:temporal)

But for item 14360. Are we going to say the triples should be:

ex:someArkID
  dc:title "(:unas) (14360)" ;
  dc:description "Balochi7-edit" , "Baluchi7-edit" ; 
  ... 

If it isn't required, it makes things easier to have a unique title for OCHRE. And whatever title we can come up with that should probably be mapped to the value for dc:title. Though I can see a case where we use a unique title in OCHRE that isn't mapped to dc:title. I don't think that works for the items in OLA though, because many of these Items do have unique titles that we want to preserve.

I think what I will do in my export script is: If there is only one primary title: that title will become the title. If there are two primary titles: both will become descriptions, and the title will be (:unas) (ItemID). Does this reflect what you would expect in what we want for our triples?

(Thank you for pointing out the type in the sh:message. I fixed it in the Shapes).

c-blair commented 3 weeks ago

I misunderstood the question, "How many dc:title triples should we allow in the EDM triples?", to mean how many are allowed by the relevant standards. I see, now, that that wasn't its intent. Yes to your proposed solution. Thanks.