obophenotype / human-phenotype-ontology

Ontology for the description of human clinical features
http://obophenotype.github.io/human-phenotype-ontology/
Other
293 stars 51 forks source link

HPO translation and terms without English synonyms with synonyms in other language #7343

Closed turnovec closed 2 years ago

turnovec commented 2 years ago

We are currently translating HPO into Czech language on Crowdin platform. I know that there I should report mainly errors/typos and other problems in original English terminology. But we currently encountered one thing - there are terms for which there is any English synonym. But there can be synonyms in language into which we are translating. But in these cases on Crowdin there is no item for synonym and no place where we can put the Czech translation. Only the term and definition. How we should solve this?

I already tried to ask Sebastian (as he is the main contact for translations) through the contact form on Crowdin, but he is probably busy so I decided to ask also there...

matentzn commented 2 years ago

Very interesting question for the general strategy. This seems like a very regular problem, and this is a real problem with Crowdin. I think there are essentially two strategies:

Option 1: We implement a simple separate mechanism for recording additional language synonyms (my preferred option). This means we have a table on GitHub with four columns:

HPO ID Synonym type Synonym Language
HPO:123 hasExactSynonym Online Spielsucht de
HPO:145 hasRelatedSynonym Liebeskummer de

We can host this table and you would simply add additional synonyms whenever you find any additional ones.

Option 2: We hack the crowdin format (error prone): I don't know whether this is possible, but I am assuming you can provide multiple translations for the same term? If so, we can agree on a standard like "EXACT: Online Spielsucht" that you can add as additional translations to the label property?

These are my provisional thoughts.. Once we have some money to fund the multilingual problem better, a (I almost hate to admit it...) simple app hosted online may be our best option.

turnovec commented 2 years ago

I was also thinking about some web application to store this. I can also implement this in Python & Django, maybe not just for our Czech translation team, but also for the others...

We currently have table on Google Drive shared among the translating team. GitHub have some option which allows to have some user friendly table for similar purposes?

But it also reminds me another issue - if HPO distinguishes between "related" and "exact" synonym, how we should cope with this in Crowdin? Because there all synonyms are thrown in one item for translation in a form "#synonym1 #synonym2 #synonym3" without this information...

matentzn commented 2 years ago

In Crowdin, we only care about exact synonyms.. so everything you are translating there is exact (or a bug).

We usually sync Google sheets with GitHub, so we can do the same here. The problem is that you will not get all the validation you need right away (table formatting etc), but I think we can start like this. So basically, a google sheet with the columns I mention above, and you just keep adding stuff to that.. We will implement some automated process to download the sheet, then add it to HPO.

I do not want to comment on the "translation tool" issue - there are thousands of things to consider, and I would rather we support only one way at the moment. However, other groups are starting to use tables for their translations instead of Crowdin (we have a table format we will support for translations) but this is at your own risk - Crowdin is much more powerful in terms of notifying you of changed terms etc.

pnrobinson commented 2 years ago

Note that I think we do not want to translate each synonym individually, since I think there are probably different kinds of synonyms in different languages. However, each synonym should be an exact match to the main HPO term. I do not know of CrowdIn supports that. We are going to try to provde a robust solution to this this year.

turnovec commented 2 years ago

Maybe Crowdin is powerfull translation tool, as it can offer some translation suggestions. But I think it's primary goal was to manage software translations. For example it would help a lot during a translation to see primary HPO term, definition and synonyms at the same time. Yes, it's possible - you can filter according to HPO code and you will then see all these three items (or just two, if there are any synonyms). But it's not user friendly. It could also help to see the hierarchy during the translation process... In Crowdin it's just a long usually unordered list of items to translate...

Many times other members of translation team reported me errors in the original HPO terms or definitions and I found that they are alreade corrected on hpo.jax.org (if not, I'm reporting them there). So it looks like we are working in Crowdin on a translation of some older version of HPO with errors/typos which were already corrected.

Regarding the synonyms: as Peter wrote - for some English synonyms we don't have exact or meaningful Czech equivalents. And I think its similar in other languages. And for some we have more synonyms in Czech - if so, we are entering them in Crowdin - they are just separated by "#" and " ". Is this a right way or we are doing it wrong?

matentzn commented 2 years ago

@turnovec we are on the same page. I thought there was a bit of an advantage with crowdin managing teams for different languages and communicating about some translation (a "social" feature), but maybe I am wrong.

I think answering your Crowdin question needs @drseb - I would not know.

matentzn commented 2 years ago

@drseb I actually was wondering about the # symbols in the xliff format as well - I thought this was some kind of unrolling done by crowdin, and in the UI the translation is presented as 1:1. I was basically assuming, if I see

# syn1 # syn2 #sync3 

That there is a corresponding

# translated_syn1 # translated_syn2 # translated_syn3

Maybe I was wrong, and if I was, could

# syn1 # syn2 # syn3 

be translated to:

# syn4_de # syn5_de

Where syn4_de is just some german synonym with no special relationship with syn1?

drseb commented 2 years ago

@matentzn the # are an approach chosen by us to manage sets of synonyms in xliff.

Can anybody tell me what the purpose would be to have 1:1 correspondence between original synonym and translation? So far I don't see the use-case

drseb commented 2 years ago

@turnovec crowdin is the most powerful tool I had with exactly zero resources. If you think you can implement a better solution, I am very much looking forward to it.

drseb commented 2 years ago

@turnovec sorry about not updating the hpo-source in crowdin. I will do it ASAP!

drseb commented 2 years ago

@pnrobinson looking forward to your more robust solution

matentzn commented 2 years ago

Just to be clear - there is no specific order to the "#" solution with the synonyms and translators are asked to add synonyms in any number and order they like?

drseb commented 2 years ago

Yes. And the information about exact or related synonym is also lost. Again: the best we could do for our use-case in mind

matentzn commented 2 years ago

The fact that precision (exact, broad) is lost is a problem though if the translations are not one to one. How do you propose then we materialise translations back into hp.owl? Can we assume that all translated synonyms are "exact"?

Maybe it makes sense to really think about synonyms differently. Having a label and definition translated makes a lot of sense - while synonyms are probably better curated outside the idea of literal translations (as separate annotations).

pnrobinson commented 2 years ago

@matentzn I think we are on the same page -- can/should this issue be closed or moved to the Wiki?

matentzn commented 2 years ago

I do not think there are action items for HPO tracker here. The issue will automatically resurface once we start proving HPO language profiles.

matentzn commented 2 years ago

@turnovec for your original question: As this is what everyone else does: feel free to simply add any synonyms to the synonyms field in crowding, rather than thinking of them as direct translations in the crowdin sense. We will, in any case, treat this field this way.

pnrobinson commented 2 years ago

I think we can close this issue, please open a new one if required.