ncbo / ontologies_api

Hypermedia API for NCBO's ontology-related projects
http://data.bioontology.org
Other
25 stars 10 forks source link

Feature: make the api use submission naturalLanguage to choose the default language to display #149

Closed syphax-bouazzouni closed 3 months ago

syphax-bouazzouni commented 3 months ago

This PR is a follow-up of https://github.com/ncbo/ontologies_api/pull/148, and makes the API use the following order of priority when displaying values:

  1. Request language if request parameters lang or language are set.
  2. Submission property natural Language first value if filled.
  3. Portal main language, by default set to English

This is useful when a resource defines all its values in another language than the portal default (English), e.g MDRGER, has only german values so in the UI as by default we try to display the default portal language which is English, but they do not exist in this case, we display the generated ones at the parsing time (the last part of the URIs).

image

With this PR you can specify its natural language to be German, and they will display in the UI and API by default the German values, instead of the portal default language (English)

codecov-commenter commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 73.37%. Comparing base (3dc9f9d) to head (c57a7ad).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## feature/multilingual-support-add-language-request-middleware #149 +/- ## ================================================================================================ + Coverage 73.28% 73.37% +0.09% ================================================================================================ Files 53 53 Lines 2916 2926 +10 ================================================================================================ + Hits 2137 2147 +10 Misses 779 779 ``` | [Flag](https://app.codecov.io/gh/ncbo/ontologies_api/pull/149/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ncbo) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/ncbo/ontologies_api/pull/149/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ncbo) | `73.37% <100.00%> (+0.09%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ncbo#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jonquet commented 3 months ago

@alexskr Addressing the issue with a default language per ontology (rather than a portal overall default language) will only address the problem partially. I mean despite the fact that it requires the list of values in the naturalLanguage properties to be orderd (to select the firs one as the default fo this ontology)... BioPortal will have to go ahead and populate this property for all the existing ontologies so that it work.

jonquet commented 3 months ago

In other words, making the backend multilingual without making the UI multilingual is limited... mostly because it hides 90% of the advantages of having BioPortal multilingual if its not done in the UI. And UI/API have somehow to evolve in parallel especially for this kind of big evolution.

alexskr commented 3 months ago

We understand that the UI needs additional work for full multi-lingual support. However, I think that the API should serve expected content when API users make calls without specifying a language.

If an ontology is primarily in German, then wouldn't it be expected to see the default output in German instead of blank entries? Alternatively, perhaps the default API behavior should be equivalent to lang=all.

jonquet commented 3 months ago

@alexskr I kind of agree with "If an ontology is primarily in German, then wouldn't it be expected to see the default output in German instead of blank entries?" What I am saying is that all 1122 ontologies in BioPortal (or you guys) will have to declare what is their "primary language" (and others) by editing the naturalLanguage property. Do you think this is realistic?

And the list of the naturalLanguage values will need to be ordered (hence the UI too) so we now the first one is the primary.

Implementing a "default" behviour at the level of the portal avoid relying on a property that is not necessarily filled in.

jvendetti commented 3 months ago

What I am saying is that all 1122 ontologies in BioPortal (or you guys) will have to declare what is their "primary language" (and others) by editing the naturalLanguage property. Do you think this is realistic?

Yes, this is realistic. @caufieldjh has already curated language tags for all of the ontologies in BioPortal. At this point we simply need to get the data from him and programmatically populate the naturalLanguage attribute on the latest submissions.

And the list of the naturalLanguage values will need to be ordered (hence the UI too) so we now the first one is the primary.

For clarification, the BioPortal Rails application has been modified to allow a user to select the language or languages for their ontology content. However, the language selector doesn't offer a way to order the languages.

I haven't been involved in the multilingual support effort, so I may be missing some background. In my mind, the concepts of specifying the language of ontology content and specifying the preferred language for serving the ontology are two different things. The Rails application doesn't currently support the latter.

I think an end user should be able to say that their ontology is available in English, French, and German, and separately be able to say that they want BioPortal to default to serving the French version (for example).

caufieldjh commented 3 months ago

Language tag table is here: https://docs.google.com/spreadsheets/d/1rYHzsSUNjjRpgD2LxxmvCvg13P-w5g-HvsUvk8VNNJ0/edit?usp=sharing

alexskr commented 3 months ago

Would it be possible to extract all language tags that are present in the ontology file automatically and fill in naturalLanguage if users don't set that when creating ontology? Ontology authors might not even realize that imported terms could have multiple languages.

@jonquet how does AgroPortal construct the drop down language selector in the UI?

jvendetti commented 3 months ago

It should be the responsibility of the ontology author to specify the language of the content of their ontology. If they encode it in the ontology source file, then yes - we could extract it at the time that we ingest the ontology. This would be part of the owlapi_wrapper code base, not this one.

As a simple first step, we should move forward with programmatically incorporating the data that Harry provided.

how does AgroPortal construct the drop down language selector in the UI?

It's not clear what you refer to here. Do you mean the language selector on the add / edit submission form? Or the language selector on the ontology summary pages?

alexskr commented 3 months ago

It's not clear what you refer to here. Do you mean the language selector on the add / edit submission form? Or the language selector on the ontology summary pages?

The ontology summary page in Agroportal lists all languages available in the ontology, and we can select one of those languages to view that ontology. That list is generated from somewhere, so my question to @jonquet is whether they manually added it to each ontology or if it was automatically generated.

jonquet commented 2 months ago

Would it be possible to extract all language tags that are present in the ontology file automatically and fill in naturalLanguage if users don't set that when creating ontology? Ontology authors might not even realize that imported terms could have multiple languages.

Eventually this should come. But this is another "module" at ontology aprsing time that we call "metadata generation" ... right now we are doing only "metadata extraction" (e.g., get owl:verisonInfo). And this is not that trivial. You have to implement a decision ... eg if an ontology if 100 classes has 2 classes with fr label ..do you say its in French? and 98 in en do you say it English ? etc..

@jonquet how does AgroPortal construct the drop down language selector in the UI?

Right now we are manually populated by ontology developer or AgroPortal curator.

jonquet commented 2 months ago

Language tag table is here: https://docs.google.com/spreadsheets/d/1rYHzsSUNjjRpgD2LxxmvCvg13P-w5g-HvsUvk8VNNJ0/edit?usp=sharing

@caufieldjh can you comment on how you have generated these values ? is this solid information (curated/evaluated) that we could use to populate the naturalLanguage property? how did you address the example question mentionned aboved to @alexskr ?

jonquet commented 2 months ago

Talking about this issue/behaviour the other day on the AgroPortal team meeting, we came to the point that indeed its ok (maybe even better) to have 2 metadata properties:

Plus, @caufieldjh values will populate the BioPortal userInterfacePreferredLanguage values which is less risky and can be "assumed" by BioPortal...

So if you go head and implement this mechanism I think we could reuse it on our side too.

alexskr commented 2 months ago

would userInterfacePerferedLanguage be primarily used for the UI or API as well?

Is there a downside to using an ordered list of languages for 'userInterfacePreferredLanguage` instead of a single value so that we have some sort of a fallback mechanism if a term or comment doesn't exist in the desired language?

caufieldjh commented 2 months ago

Language tag table is here: https://docs.google.com/spreadsheets/d/1rYHzsSUNjjRpgD2LxxmvCvg13P-w5g-HvsUvk8VNNJ0/edit?usp=sharing

@caufieldjh can you comment on how you have generated these values ? is this solid information (curated/evaluated) that we could use to populate the naturalLanguage property? how did you address the example question mentionned aboved to @alexskr ?

This is manually curated as many ontologies do not contain sufficient metadata to determine their primary language programmatically. I considered these features:

We see in cases like DCAT3 that language translations may be stored in comments and/or are incomplete across the ontology classes, or in this case, both. In these cases I did not include the languages beyond English in the annotation.

jvendetti commented 2 months ago

Plus, @caufieldjh values will populate the BioPortal userInterfacePreferredLanguage values which is less risky and can be "assumed" by BioPortal...

I don't really understand this statement.

Harry has clearly done a very thorough job of manually curating language tags for all of BioPortal's ontologies using a number of different criteria. I believe this data should absolutely be used to populate the MOD naturalLanguage attribute, and characterizing this as having any sort of associated risks seems implausible.

On the off chance that any of the naturalLanguage values are initialized incorrectly, they could be trivially modified by anyone on the BioPortal team, or the ontology maintainers themselves.

jvendetti commented 2 months ago

Talking about this issue/behaviour the other day on the AgroPortal team meeting, we came to the point that indeed its ok (maybe even better) to have 2 metadata properties:

  • naturalLanguage ...
  • userInterfacePreferredLanguage with one value only (also present in naturalLanguage) that set the default language. This property is BioPortal specific (=> do not make it to MOD)

I would like to understand better what is being proposed here. The meaning of this userinterfacePreferredLanguage property isn't clear to me from the above description. What is the use case for this? Do you mean that someone uploading an ontology would not only have to specify the available languages, but also which of the available languages they want the ontology to initially display in? Wouldn't there be default language at the level of the portal that could be used for determining the default?