[SCHEMA] Add language tags to labels, synonyms, and acronyms

cthoyt commented 1 year ago

Describe the problem you would like to solve

It's not directly possible to tell the language of the label, a synonym, or an acronym in a given ROR entry. I would like the data structure to be improved to include a language tag for each of these elements so I can filter out irrelevant languages based on the locale of the application.

Describe the schema change that you would like in order to solve the problem

Rather than using strings for the label, synonym, and acronym field, I'd suggest switching to dictionaries that include a key for "value" and "lang" then use standard language tags, such as those found in OWL and other other XML

Who would benefit from this change?

As I've been going through the ROR bulk content, there are some entries I can not quickly understand because there is no English label. This seems like an issue for all kinds of users.

Additionally, adding label information could make it easier to identify ROR entries in bulk with no English label and prioritize for curation.

lizkrznarich commented 1 year ago

Hi @cthoyt , we actually have a request for comment on ROR schema v2.0 open right now that includes many of the changes that you're suggesting here. We'd really appreciate it if you could have a look at https://docs.google.com/document/d/1JNDMoKmjR2y0quWXwFfoJTsIttbltJVN0l5Wddw1cIk and add comments there.

cthoyt commented 1 year ago

Hi @lizkrznarich I appreciate that you're actively looking for and considering community feedback, but I found that this Google Doc is incredibly hard to navigate and therefore I had a hard time getting motivated to engage with it.

I also got frustrated reading through the other feedback since it seems like a lot of spitballing. Maybe it would be possible that you could more actively moderate/incorporate useful feedback into the document then delete the rest.

I will try and give it another shot anyway

lizkrznarich commented 1 year ago

@cthoyt Apologies that the schema v2.0 doc is a bit difficult to navigate. We prefer to leave all comments open and visible throughout the initial feedback period, so that everyone can see the discussion history around a given topic. As with our other public feedback calls, our plan for this one is to export and anonymize the comments, and compile them into a summary doc with a recommendation (based on the comments) for each schema section. We'll share this, along with a "final draft" schema v2.0 in mid/late Feb.

adambuttrick commented 4 months ago

Addressed with the release of schema v2.0.

ror-community / ror-roadmap

[SCHEMA] Add language tags to labels, synonyms, and acronyms #121