ontoportal-lirmm / annotators

Web service to add functionalities to the http://bioportal.bioontology.org and similar ontology annotators
5 stars 6 forks source link

Implement the semantic group expansion #14

Closed jonquet closed 7 years ago

jonquet commented 7 years ago

This feature is similar than #13 except that the semantic group are defined by 👍 https://mor.nlm.nih.gov/pubs/pdf/2003-jbi-ob.pdf (page 3) The name and acronyms of the paper must be used to identify the groups.

We need to add a new parameter: &semantic_groups= that will take acronyms of smeantic groups e.g., &semantic_groups=GENE

This parameter needs to be expanded to the appropriate semantic types in the proxy service (this project). e.g., &semantic_types=T087,T088,T028,T085,T086

(note that in the case of T085, there is an expansion to the sub types that is already described in the PDF page 3)

jonquet commented 7 years ago

Once implemented (@twktheainur looks into this), assign to @vemonet to add the following UI pop-up on the right of the "Select UMLS Semantic Types" area: "Or, Select UMLS Semantic groups" and in small characters: "UMLS Smeantic Groups are defined by Bodenreider & McCray 2003 (https://mor.nlm.nih.gov/pubs/pdf/2003-jbi-ob.pdf) as more general groups of semantic types.

twktheainur commented 7 years ago

@jonquet You may assign this to me, I'll notify you when I am done with the backend implementation. I will then coordinate with @vemonet for him to do the UI and to discuss the particulars of the deployment process

Best

twktheainur commented 7 years ago

@jonquet @vemonet I implemented a fix for the semantic group expansion and the brat output, initial tests indicate no regression in previous features. I made the tests locally on my machine without deploying my own ncbo_bioportal but by changing the annotatorURI to that of the lirmm NCBO bioportal and by temporarily commenting out the proxy host extraction.

The modifications to the rest API are the following:

For the groups, I have added a file with the type/group mappings directly in the project resources (accessible through the classloader).

For the BRAT output query the 4store server (groupes, cuis). For now it is a constant in the AnnotatorServlet class. A more robust configuration through a properties file in the classpath is strongly advisable.

Since I am not a developer on the annotators project, I am unable to push to a branch for testing. Consequently, I created a patch containing all my modifications against origin/master. I cannot attach it her though, I will send it to wither of you directly upon request.

Most of the implementation for the BRAT format is in a separate dependency, bioportal-annotator-api that I extracted from the code I had written earlier to produce the evaluations. For now I added a dependency to the SNAPSHOT version, this will have to change once the artefacts are deployed on the maven central repository with @vemonet 's credentials.

Please advise as to the course of action I should take for a staging server deployment and tests. Subsequently the issues should be transferred to @vemonet so that he may implement the new features in the web interface.

jonquet commented 7 years ago

(Sorry did not see the text was repeated from #16 )

Concerning the semantic group expansion, sounds good. Be sure to use semantic_groups as parameter name. It's a little long, but we have already another parameter group in the API so we need to avoid the confusion.

I would recommend to avoid a dependency if possible and move the code into this project. The code we write to parse the annotators output if generalizable, should all be maintained in this project.

Assigning to @vemonet to discuss the deployment issue (normal this is the first time).

twktheainur commented 7 years ago

I have changed the parameter name to semantic_groups.

The dependency doesn't concern the semantic group expansion as it doesn't involve parsing the annotations.

jonquet commented 7 years ago

Testing on stage:

http://services.stageportal.lirmm.fr/ncbo_annotatorplus/?text=the%20patient%20does%20not%20have%20fever&ontologies=MESH&longest_only=false&exclude_numbers=false&whole_word_only=true&exclude_synonyms=false&expand_mappings=false&semantic_groups=LIVB returns only patient

http://services.stageportal.lirmm.fr/ncbo_annotatorplus/?text=the%20patient%20does%20not%20have%20fever&ontologies=MESH&longest_only=false&exclude_numbers=false&whole_word_only=true&exclude_synonyms=false&expand_mappings=false&semantic_groups=DISO returns only fever

But http://services.stageportal.lirmm.fr/ncbo_annotatorplus/?text=the%20patient%20does%20not%20have%20fever&ontologies=MESH&longest_only=false&exclude_numbers=false&whole_word_only=true&exclude_synonyms=false&expand_mappings=false&semantic_groups=DISO,LIVB generates an error.

In addition, is the link: https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt ok to reference the inclusion of types within groups ?

twktheainur commented 7 years ago

@jonquet I have fixed the problem and redeployed on stageportal on ncbo_annotatorplus and annotator

The file is adequate and correct with regard to the semantic groups, you can use it for documentation purposes.

jonquet commented 7 years ago

Just a quick question: what happen if a semantic_groups and a semantic_types parameters are given together. In theory:

jonquet commented 7 years ago

@vemonet In the UI, I propose to use the following translation in French (francais) for the following UMLS Semantic groups:

ACTI Activités & Comportements ANAT Anatomie CHEM Produits chimiques & Médicaments CONC Concepts & Idées DEVI Dispositifs DISO Maladies GENE Gènes et Séquences moléculaires GEOG Zone géographiques LIVB Êtres vivants OBJC Objets OCCU Profession ORGA Organisations PHEN Phénomène PHYS Physiologie PROC Procédure

twktheainur commented 7 years ago

Currently, I expand the semantic types of the groups and add them to the existing semantic_group parameter. So if groups and the types in semantic_types do no overlap, we use the union of the two currently.

I felt at the time it was a sensible solution for this

jonquet commented 7 years ago

ok http://services.stageportal.lirmm.fr/ncbo_annotatorplus/?text=the%20patient%20does%20not%20have%20fever&ontologies=MESH&longest_only=false&exclude_numbers=false&whole_word_only=true&exclude_synonyms=false&expand_mappings=false&semantic_groups=DISO,LIVB returns now patient and fever

ok http://services.stageportal.lirmm.fr/annotator/?text=Les%20patients%20ont%20de%20la%20fi%C3%A8vre%20&ontologies=MSHFRE&longest_only=false&exclude_numbers=false&whole_word_only=true&exclude_synonyms=false&expand_mappings=false returns no patients and fièvre

Sounds good! Assigning to @vemonet for the UI development.

twktheainur commented 7 years ago

Now implemented, tested, added to the UI and pushed in production