ontoportal-lirmm / annotators

Web service to add functionalities to the http://bioportal.bioontology.org and similar ontology annotators
5 stars 6 forks source link

Identify the French NegEx/Context trigger terms #19

Closed jonquet closed 7 years ago

jonquet commented 7 years ago

This task is related to #18 and will focus on generating/translating/extracting/finding the trigger terms for French.

A list has been sent by Louise already and see email with Wendy's group?.

Followed by @amineabdaoui

amineabdaoui commented 7 years ago

I automatically translated the english trigger terms present inside ConText code (355 trigger terms).

First, I used a perl script that queries six online translators (babla, dico_isc_cnrs, sensagent, linguee and wordreference) and computes for each translation the number of online tools returning it. This perl script has been written to translate a sentiment lexicon from english to french. We have found that the translations obtained by at least 3 (out of the 6) online translators are usually correct. See: http://www.lirmm.fr/~abdaoui/publications/FEEL.pdf Therefore, I created a first list of French Trigger terms by keeping the translations obtained by at least 3 online translators (attached file: ScriptTranslations.txt).

Then, I noticed that the first proposition of translation given by Google Translate seemed to give better results (especially for compound terms). Therefore, I created a second list of French trigger terms based on the first translations of Google Translate (attached file: GoogleTranslations.txt).

I tested the java code with the current automatically obtained French Trigger terms and it worked well (after few adaptations).

Now, I suggest to manually validate and enrich all the automatically obtained entries. For each english entry, i'll display the different available translations (those obtained by the perl script, by Google Translate and those translated by Louise Deleguer). The annotator will have to validate/modify the French terms, their scope and their action.

Ps. In my opinion searching more trigger terms (for instance by extending to synonyms) is not interesting because Wendy Chapman said several times in her papers that only few words appear a large number of times and most words only occur a few number of times. So i think that we will have to manually annotate a large number of words that may never occur.

However, it would be interesting to use development set to evaluate and then enrich our French Trigger terms. Should we wait for the EHR of HEGP?

ScriptTranslations.txt GoogleTranslations.txt

amineabdaoui commented 7 years ago

A list of french trigger terms has been generated according to the following process:

1. Automatic Translation: the english trigger terms have been translated automatically using web translators (Google Translate and six dictionary based translators).

2. Manual Validation and Enrichment: a human annotator manually validated and enriched all the automatically obtained entries.

3. Enrichment with Deleger's trigger list: the french list obtained at the previous step has been merged with the one produced in (Deléger and Grouin, 2012). Automatic checking have been applied. Differences were resolved manually.

4. Enrichment with Burgun's trigger list: the obtained list has been merged with the one presented in (Garcelon et al., 2014). The merging process was conducted manually.

The obtained list contains 711 french trigger terms (604 for negation, 64 for history and 43 for the experiencer). It may be accessed at: https://drive.google.com/file/d/0B3J2GWmU0NTydF84MW5MQTF1VjQ

jonquet commented 7 years ago

I shall close the tracker for now as we will release a first version with this set of trigger terms. Follow up in #18