uwlib-cams / uwlswd_vocabs_marc_006_008

https://uwlib-cams.github.io/uwlswd/
Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

Change ConceptScheme titles #4

Closed gerontakos closed 11 months ago

gerontakos commented 1 year ago
  1. remove "MARC21-008" as a prefix
  2. further shorten titles.
    • change "all materials" to "all" and "common" to "some"
    • remove redundancies like "of books" in "books: illustrations of books"
    • maybe: shorten "computer files" to "computer"
    • maybe: shorten "continuing resources" to "continuing"

Note: the full dct:title value will be used as the dataset filename in Github.

briesenberg07 commented 11 months ago

Moving discussion content to this issue, which came first

@dkreisstomkins writes:

Hi @gerontakos and @briesenberg07!

Here are two proposals for names, more or less following instructions from issue 4. For the proposals I went back to the original LoC names (which OMR strays from, especially in the 007s), using "of" to connect the material type with the value title in the natural language proposal, and using a : to connect in the hierarchical proposal. Awkward titles are highlighted in pink.

To further shorten, words, especially material types, can be cut off.

I prefer to maintain the hierarchy that the library of congress establishes, even though I think I understand that the vocabularies won't be referencing each other, and so the hierarchy could be meaningless.

Thanks for your patience!

David

briesenberg07 commented 11 months ago

Thank you for sharing the 008 names spreadsheet** @dkreisstomkins ! I believe I prefer the concept scheme titles in column H: "UW / hierarchy maintained". Is this your preference as well?

The spreadsheet includes 008 and 007 scheme names. Would 006 follow the same pattern? (Or, you might not be there yet.)

** UW NetID access

briesenberg07 commented 11 months ago

Follow-up question: 'MARC21' will be removed from concept scheme titles. Where will the data describe itself as being related to MARC21? Would this be in the dct:description, etc?

<ConceptScheme> a skos:ConceptScheme ;
   dct:description "[something here?]"@en .
dkreisstomkins commented 11 months ago

Yes! Thank you.

last questions first: My understanding is that MARC21 will be entered under an alternative title entry. Copied here (and apologies for posting in all the most inappropriate places and tagging unnecessarily. I need lessons on github style and etiquette.) is my comment below the To do list: "Regarding item 5: "Some vocabularies will have the same alternative title. Is this ok?

"Example: both Computer Files and Some/Common vocabulary sets have a vocabulary numbered 23, which means both will have

<dct:alternative>MARC21 008/23 values expressed using RDF</dct:alternative>

as their entry, without anything differentiating the two concept schemes in this line.

First question: yes, the hierarchical title is my preference. I just realized, while testing the change today, that file names won't support colons (see the proposal worksheet now without colons). It looks a bit less formal and the hierarchy isn't clear without seeing them next to other material types, so that is a weakness.

follow up question regarding colons: can I still put a colon in the concept scheme title, or should the data set file name and the concept scheme title be an exact* match?

dkreisstomkins commented 11 months ago

Oh! Regarding 007.

There are six vocabularies (for Music and Visual Materials) in 007 which have no corresponding material types in the LoC 007 page. I thought originally that it was some artifact from the before-1982-MARC. See from the 007 content designator history:

"In 1981, the present generalized approach to coding physical description characteristics in field 007 was defined. Prior to that time, this field was defined only in the visual materials and music specifications and contained a variable number of fixed-length entries." (my bolding)

However, on Monday I realized that these six vocabularies are most likely simply misplaced. They have corresponding vocabularies in 008 that match exactly and nothing (except the note above) to tie them to 007. I have no idea why they are there, and so I believe should be moved to 008. That is why they are included on the sheet!

I haven't looked at 006, but it is just one vocabulary. I'll put it on the sheet now.

@briesenberg07 comments:

@dkreisstomkins I'm not sure I understand what is being proposed here... please let me know if you need further information or feedback. If this is about changing the way that some concept schemes are grouped (you mention above "I believe should be moved to 008"), most likely needs a different issue.

briesenberg07 commented 11 months ago

You ask in https://github.com/uwlib-cams/uwlswd_vocabs_marc_006_008/issues/9#issuecomment-1664509365 \:

"Regarding item 5: "Some vocabularies will have the same alternative title. Is this ok? "Example: both Computer Files and Some/Common vocabulary sets have a vocabulary numbered 23, which means both will have:

dct:alternativeMARC21 008/23 values expressed using RDF</dct:alternative>

as their entry, without anything differentiating the two concept schemes in this line.

@dkreisstomkins I believe this is okay. Many things we might describe in RDF might have the same value for some attribute(s) or other. For example, if we were creating a vegetable vocab, we might see lots of triples:

veg:[some_veg] vegprop:hasColor veg:green .

And this would be perfectly fine, because we are not relying on this single triple to unambiguously identify the resource.

(This is an oversimplification, but that's how I see it.)

briesenberg07 commented 11 months ago

file names won't support colons [...] can I still put a colon in the concept scheme title, or should the data set file name and the concept scheme title be an exact match?

I think colons are OK in dct:title values. Per current file-naming conventions, these would be stripped from file names.

Although I think I need to update the guidance there to say that : should be replaced by _, whereas ' should simply be stripped.

gerontakos commented 11 months ago

I don't think it matters too much about the names of the concept schemes now that you've got two acceptable name formats, both of them improvements. I would say that the awkward natural language values make the hierarchy-maintained values better, even without the colons. As for the one awkward hierarchy-maintained names: why can't you just call "music format of music" "music format"?

You decide on the colon in the actual data. Just be consistent. I think you want to have file names without, dct:titles with. That's fine. Benjamin can handle the documentation, as he stated.

The alternative titles do not need to be unique. However, it seems like we left out format information from the alt titles. So: "MARC21 008/23 values for computer files expressed using RDF." "MARC21 008/23 values for multiple formats expressed using RDF" I think that's better. Maybe you can do even better.

Note: in addition to the alt titles, we also connect to MARC21 using a provenance statement and a dct:source property.

As we discussed outside Github, you should repair those misplaced 007 vocabularies, thank you.

gerontakos commented 11 months ago

David completed the rename Aug 7, 2023