Closed turnovec closed 2 years ago
Very interesting question for the general strategy. This seems like a very regular problem, and this is a real problem with Crowdin. I think there are essentially two strategies:
Option 1: We implement a simple separate mechanism for recording additional language synonyms (my preferred option). This means we have a table on GitHub with four columns:
HPO ID | Synonym type | Synonym | Language |
---|---|---|---|
HPO:123 | hasExactSynonym | Online Spielsucht | de |
HPO:145 | hasRelatedSynonym | Liebeskummer | de |
We can host this table and you would simply add additional synonyms whenever you find any additional ones.
Option 2: We hack the crowdin format (error prone): I don't know whether this is possible, but I am assuming you can provide multiple translations for the same term? If so, we can agree on a standard like "EXACT: Online Spielsucht" that you can add as additional translations to the label property?
These are my provisional thoughts.. Once we have some money to fund the multilingual problem better, a (I almost hate to admit it...) simple app hosted online may be our best option.
I was also thinking about some web application to store this. I can also implement this in Python & Django, maybe not just for our Czech translation team, but also for the others...
We currently have table on Google Drive shared among the translating team. GitHub have some option which allows to have some user friendly table for similar purposes?
But it also reminds me another issue - if HPO distinguishes between "related" and "exact" synonym, how we should cope with this in Crowdin? Because there all synonyms are thrown in one item for translation in a form "#synonym1 #synonym2 #synonym3" without this information...
In Crowdin, we only care about exact synonyms.. so everything you are translating there is exact (or a bug).
We usually sync Google sheets with GitHub, so we can do the same here. The problem is that you will not get all the validation you need right away (table formatting etc), but I think we can start like this. So basically, a google sheet with the columns I mention above, and you just keep adding stuff to that.. We will implement some automated process to download the sheet, then add it to HPO.
I do not want to comment on the "translation tool" issue - there are thousands of things to consider, and I would rather we support only one way at the moment. However, other groups are starting to use tables for their translations instead of Crowdin (we have a table format we will support for translations) but this is at your own risk - Crowdin is much more powerful in terms of notifying you of changed terms etc.
Note that I think we do not want to translate each synonym individually, since I think there are probably different kinds of synonyms in different languages. However, each synonym should be an exact match to the main HPO term. I do not know of CrowdIn supports that. We are going to try to provde a robust solution to this this year.
Maybe Crowdin is powerfull translation tool, as it can offer some translation suggestions. But I think it's primary goal was to manage software translations. For example it would help a lot during a translation to see primary HPO term, definition and synonyms at the same time. Yes, it's possible - you can filter according to HPO code and you will then see all these three items (or just two, if there are any synonyms). But it's not user friendly. It could also help to see the hierarchy during the translation process... In Crowdin it's just a long usually unordered list of items to translate...
Many times other members of translation team reported me errors in the original HPO terms or definitions and I found that they are alreade corrected on hpo.jax.org (if not, I'm reporting them there). So it looks like we are working in Crowdin on a translation of some older version of HPO with errors/typos which were already corrected.
Regarding the synonyms: as Peter wrote - for some English synonyms we don't have exact or meaningful Czech equivalents. And I think its similar in other languages. And for some we have more synonyms in Czech - if so, we are entering them in Crowdin - they are just separated by "#" and " ". Is this a right way or we are doing it wrong?
@turnovec we are on the same page. I thought there was a bit of an advantage with crowdin managing teams for different languages and communicating about some translation (a "social" feature), but maybe I am wrong.
I think answering your Crowdin question needs @drseb - I would not know.
@drseb I actually was wondering about the #
symbols in the xliff format as well - I thought this was some kind of unrolling done by crowdin, and in the UI the translation is presented as 1:1. I was basically assuming, if I see
# syn1 # syn2 #sync3
That there is a corresponding
# translated_syn1 # translated_syn2 # translated_syn3
Maybe I was wrong, and if I was, could
# syn1 # syn2 # syn3
be translated to:
# syn4_de # syn5_de
Where syn4_de
is just some german synonym with no special relationship with syn1
?
@matentzn the # are an approach chosen by us to manage sets of synonyms in xliff.
Can anybody tell me what the purpose would be to have 1:1 correspondence between original synonym and translation? So far I don't see the use-case
@turnovec crowdin is the most powerful tool I had with exactly zero resources. If you think you can implement a better solution, I am very much looking forward to it.
@turnovec sorry about not updating the hpo-source in crowdin. I will do it ASAP!
@pnrobinson looking forward to your more robust solution
Just to be clear - there is no specific order to the "#" solution with the synonyms and translators are asked to add synonyms in any number and order they like?
Yes. And the information about exact or related synonym is also lost. Again: the best we could do for our use-case in mind
The fact that precision (exact, broad) is lost is a problem though if the translations are not one to one. How do you propose then we materialise translations back into hp.owl? Can we assume that all translated synonyms are "exact"?
Maybe it makes sense to really think about synonyms differently. Having a label and definition translated makes a lot of sense - while synonyms are probably better curated outside the idea of literal translations (as separate annotations).
@matentzn I think we are on the same page -- can/should this issue be closed or moved to the Wiki?
I do not think there are action items for HPO tracker here. The issue will automatically resurface once we start proving HPO language profiles.
@turnovec for your original question: As this is what everyone else does: feel free to simply add any synonyms to the synonyms field in crowding, rather than thinking of them as direct translations in the crowdin sense. We will, in any case, treat this field this way.
I think we can close this issue, please open a new one if required.
We are currently translating HPO into Czech language on Crowdin platform. I know that there I should report mainly errors/typos and other problems in original English terminology. But we currently encountered one thing - there are terms for which there is any English synonym. But there can be synonyms in language into which we are translating. But in these cases on Crowdin there is no item for synonym and no place where we can put the Czech translation. Only the term and definition. How we should solve this?
I already tried to ask Sebastian (as he is the main contact for translations) through the contact form on Crowdin, but he is probably busy so I decided to ask also there...