Closed mcritchlow closed 7 years ago
@ucsdlib/domm The class for language is currently defined as dc:LinguisticSystem in the new data model and I don't see any properties for it. How should we model it? Thanks.
I'm having trouble finding the specs for dc:LinguisticSystems. I did find these examples though: http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dcterms:language
Same here. All I see is one property, dcterms:language
, with other examples being blank nodes for the language label.
I think an issue here is that if we do rely on LinguisticSystem as a class, we should either
dcterms:language
and not dc:language
as we currently do. dc:language
and have a more relaxed range. Some helpful (kinda?) info on dc vs dcterms ranges: http://wiki.dublincore.org/index.php/FAQ/DC_and_DCTERMS_Namespaces
The language label(s) we could likely dereference for indexing and UI display. So based on feedback from @remerjohnson and @GregReser could just use/persist a language URI instead via dcterms:language
and drop the LinguisticSystem class?
I agree that the prefixes dc11 and dc are confusing. dc and dcterms seem much more intuitive. I guess we are stuck with them since the Hydra projects are using them.
I like @mcritchlow idea to drop LinguisticSystem. It's too vague to be useful and all we really need is a language URI anyway. I'm in favor of using the basic dc11 when possible. I think we established previously that you can use non-literals in dc11, so I would go with that.
At one point we tried to reconcile the dc11/dc namespacing in the data dictionary. if this is out of sync again, we probably should file a ticket to get those back in sync. I agree dc11 and dc is confusing, but it is what it is.
Ah okay, the lack of dcterms:
I was seeing makes more sense now (although the general framework is a bit confusing!). Agreed we could drop the class.
I don't think we want to specify a separate class for language values, we want to keep this simple, so I think dropping LinguisticSystem makes sense.
Whether we want to limit the value to a URI is another question.
@lsitu how is language implemented in CurationConcerns? I am not sure we have any specialized needs for the language element, so perhaps we can just use the "out of the box" property.
@arwenhutt I think it's just literal with dc11:language. Here is the default definition: https://github.com/projecthydra/curation_concerns/blob/master/app/models/concerns/curation_concerns/basic_metadata.rb#L52-L54
Okay, so if we want to restrict to URIs only, then we would need to add our own range restriction.
@lsitu do you think it will be easier to work with if we restrict the range to URI or leave it open and potentially have a mix of URI and literal values?
@arwenhutt I think restricting it to URIs looks good.
If we mixed URIs and literals, we would have to do something to the value to indicate it being a URI, correct? Like append values with http://lexvo.org/id/iso639-3/eng^^dcterms:URI
Is there a scenario where we wouldn't be able to assign a URI?
I don't think so. I initially thought about the values we used in the Korean and Chinese posters (although I realize that they aren't dc(whatevs):language but xsd:language) like "ko-hang" - but even for that, wikipedia has us covered: https://en.wikipedia.org/wiki/Template:ISO_639_name_ko-Hang
This is a little apples and oranges, but my point is that I don't think assigning a uri for a language should be an issue.
@ucsdlib/domm Could we have a more specific list CVs of languages with uri and the the label we want, like https://github.com/ucsdlib/dams5-cc-pilot/issues/14#issuecomment-249976945? I think we can import and manage them the same way as as what we do for type (typeOfResource) as local authorities. Thanks.
@lsitu As reported in the Sprint meeting, I will look into this CV
Actually, I have a question. Would a CV list of just the bare URIs to support work, or do you need labels as well? If so, how should I represent that in the Sheet?
@remerjohnson would the URI's have a consistent property we could grab labels from? If so, a list of just the URI's might be OK.
@remerjohnson If you want to provide us a label and the URI, then tab delimited format would be great. But as @mcritchlow mentioned above, if there's a location that we can easy to grab the label consistently, then a list of URIs will work.
@mcritchlow I believe so, but I've attached an example URI from the id.loc.gov ISO639-2 authorities. Each URI seems to have a good English label for both the MADS authoritativeLabel and the skos:prefLabel. Let me know if that works @lsitu, or I can just provide a csv with the URIs + labels (and maybe the codes as well?).
Also, are we sticking with ISO639-2? :smile: And using id.loc.gov is better than, say, lexvo?
@remerjohnson I think we can extract it from other formats. Do the CVs (URI and Label (English)) from the tab delimited format http://id.loc.gov/vocabulary/iso639-2.tsv look good?
@lsitu Yup, that looks perfect.
@ucsdlib/domm: do we want to review this, or are we good to okay it and move forward?