Language implementation

mcritchlow commented 8 years ago

[ ] Implement Language from Data Model
[ ] Identify any known issues and tag @ucsdlib/domm with questions

lsitu commented 7 years ago

@ucsdlib/domm The class for language is currently defined as dc:LinguisticSystem in the new data model and I don't see any properties for it. How should we model it? Thanks.

ghost commented 7 years ago

I'm having trouble finding the specs for dc:LinguisticSystems. I did find these examples though: http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dcterms:language

remerjohnson commented 7 years ago

Same here. All I see is one property, dcterms:language, with other examples being blank nodes for the language label.

I think an issue here is that if we do rely on LinguisticSystem as a class, we should either

reference dcterms:language and not dc:language as we currently do.
Or, we could drop LinguisticSystem and instead use dc:language and have a more relaxed range.

Some helpful (kinda?) info on dc vs dcterms ranges: http://wiki.dublincore.org/index.php/FAQ/DC_and_DCTERMS_Namespaces

mcritchlow commented 7 years ago

The language label(s) we could likely dereference for indexing and UI display. So based on feedback from @remerjohnson and @GregReser could just use/persist a language URI instead via dcterms:language and drop the LinguisticSystem class?

lsitu commented 7 years ago

It looks like that the terms a little bit confusing dc11/dc vs. dc/dcterms, and dc11/dc is referenced in the rdf vocabularies that we are using in hydra projects: DC11 / DC

ghost commented 7 years ago

I agree that the prefixes dc11 and dc are confusing. dc and dcterms seem much more intuitive. I guess we are stuck with them since the Hydra projects are using them.

I like @mcritchlow idea to drop LinguisticSystem. It's too vague to be useful and all we really need is a language URI anyway. I'm in favor of using the basic dc11 when possible. I think we established previously that you can use non-literals in dc11, so I would go with that.

mcritchlow commented 7 years ago

At one point we tried to reconcile the dc11/dc namespacing in the data dictionary. if this is out of sync again, we probably should file a ticket to get those back in sync. I agree dc11 and dc is confusing, but it is what it is.

remerjohnson commented 7 years ago

Ah okay, the lack of dcterms: I was seeing makes more sense now (although the general framework is a bit confusing!). Agreed we could drop the class.

arwenhutt commented 7 years ago

I don't think we want to specify a separate class for language values, we want to keep this simple, so I think dropping LinguisticSystem makes sense.

Whether we want to limit the value to a URI is another question.

@lsitu how is language implemented in CurationConcerns? I am not sure we have any specialized needs for the language element, so perhaps we can just use the "out of the box" property.

lsitu commented 7 years ago

@arwenhutt I think it's just literal with dc11:language. Here is the default definition: https://github.com/projecthydra/curation_concerns/blob/master/app/models/concerns/curation_concerns/basic_metadata.rb#L52-L54

arwenhutt commented 7 years ago

Okay, so if we want to restrict to URIs only, then we would need to add our own range restriction.

@lsitu do you think it will be easier to work with if we restrict the range to URI or leave it open and potentially have a mix of URI and literal values?

lsitu commented 7 years ago

@arwenhutt I think restricting it to URIs looks good.

remerjohnson commented 7 years ago

If we mixed URIs and literals, we would have to do something to the value to indicate it being a URI, correct? Like append values with http://lexvo.org/id/iso639-3/eng^^dcterms:URI

mcritchlow commented 7 years ago

Is there a scenario where we wouldn't be able to assign a URI?

arwenhutt commented 7 years ago

I don't think so. I initially thought about the values we used in the Korean and Chinese posters (although I realize that they aren't dc(whatevs):language but xsd:language) like "ko-hang" - but even for that, wikipedia has us covered: https://en.wikipedia.org/wiki/Template:ISO_639_name_ko-Hang

arwenhutt commented 7 years ago

This is a little apples and oranges, but my point is that I don't think assigning a uri for a language should be an issue.

lsitu commented 7 years ago

@ucsdlib/domm Could we have a more specific list CVs of languages with uri and the the label we want, like https://github.com/ucsdlib/dams5-cc-pilot/issues/14#issuecomment-249976945? I think we can import and manage them the same way as as what we do for type (typeOfResource) as local authorities. Thanks.

remerjohnson commented 7 years ago

@lsitu As reported in the Sprint meeting, I will look into this CV

Actually, I have a question. Would a CV list of just the bare URIs to support work, or do you need labels as well? If so, how should I represent that in the Sheet?

mcritchlow commented 7 years ago

@remerjohnson would the URI's have a consistent property we could grab labels from? If so, a list of just the URI's might be OK.

lsitu commented 7 years ago

@remerjohnson If you want to provide us a label and the URI, then tab delimited format would be great. But as @mcritchlow mentioned above, if there's a location that we can easy to grab the label consistently, then a list of URIs will work.

remerjohnson commented 7 years ago

@mcritchlow I believe so, but I've attached an example URI from the id.loc.gov ISO639-2 authorities. Each URI seems to have a good English label for both the MADS authoritativeLabel and the skos:prefLabel. Let me know if that works @lsitu, or I can just provide a csv with the URIs + labels (and maybe the codes as well?).

Also, are we sticking with ISO639-2? :smile: And using id.loc.gov is better than, say, lexvo? id loc gov_iso639-2

lsitu commented 7 years ago

@remerjohnson I think we can extract it from other formats. Do the CVs (URI and Label (English)) from the tab delimited format http://id.loc.gov/vocabulary/iso639-2.tsv look good?

remerjohnson commented 7 years ago

@lsitu Yup, that looks perfect.

@ucsdlib/domm: do we want to review this, or are we good to okay it and move forward?

ucsdlib / dams5-cc-pilot

Language implementation #2