Get the annotations in the BRAT format

jonquet commented 8 years ago

This feature will consist in postprocessing the results of the annotators (with the proxy service) to returns the annotations in the BRAT format in plain text. Mainly, the idea would be to use UMLS CUI as main identifier rather than the actual URIs of classes.

We should add a possible value to the &format=brat parameter. The code to implement &format=rdf can certainly be reused here.

@twktheainur has already written some kind of code for this

When done, assign to @vemonet to offer the option in the ui

twktheainur commented 8 years ago

@jonquet @vemonet I implemented a fix for the semantic group expansion and the brat output, initial tests indicate no regression in previous features. I made the tests locally on my machine without deploying my own ncbo_bioportal but by changing the annotatorURI to that of the lirmm NCBO bioportal and by temporarily commenting out the proxy host extraction.

The modifications to the rest API are the following:

modification of the format parameter: "brat" is now a valid value and produces application/brat output.
a new parameter groups, the syntax is groups=GROUP1,GROUP2, ... où GROUPX sont les noms des groupes sémantiques UMLS, soit: ANAT, CHEM, DEVI, DISO, GEOG, LIVB, OBJC, PHEN, PHYS, PROC

For the groups, I have added a file with the type/group mappings directly in the project resources (accessible through the classloader).

For the BRAT output query the 4store server (groupes, cuis). For now it is a constant in the AnnotatorServlet class. A more robust configuration through a properties file in the classpath is strongly advisable.

Since I am not a developer on the annotators project, I am unable to push to a branch for testing. Consequently, I created a patch containing all my modifications against origin/master. I cannot attach it her though, I will send it to wither of you directly upon request.

Most of the implementation for the BRAT format is in a separate dependency, bioportal-annotator-api that I extracted from the code I had written earlier to produce the evaluations. For now I added a dependency to the SNAPSHOT version, this will have to change once the artefacts are deployed on the maven central repository with @vemonet 's credentials.

Please advise as to the course of action I should take for a staging server deployment and tests. Subsequently the issues should be transferred to @vemonet so that he may implement the new features in the web interface.

jonquet commented 8 years ago

Sounds good.

Mid-term the join request with the triple store may certainly be removed. We have identified with the NCBO team how to populate any ontologies in a portal with CUI/semantic type properties (will comment this in another tracker). Therefore, the annotator itself will do the filtering and return the annotations class either with CUI info direct or if CUI is not included a join with the /ontologies/class service will need to be done (at ontologies_api level if possible). The idea of avoiding the join is to keep the possibility of having this feature for the main NCBO Annotator as we do not have access to their sparql. So right now, I would suggest to do the best implementation approach considering as for the RRF ontologies that we will have soon CUI info for each ontologies.

COncerning the semantic group expansion, sounds good. Just report in #14 and be sure to use semantic_groups as parameter name. It's a little long, but we have already another parameter group in the API so we need to avoid the confusion.

I would recommend to avoid a dependency if possible and move the code into this project. The code we write to parse the annotators output if generalizable, should all be maintained in this project.

Assigning to @vemonet to discuss the deployment issue (normal this is the first time).

twktheainur commented 8 years ago

If the CUIs/Groups become available in the output of NCBO bioportal, then it will be fairly trivial to remove the need to query the triple store

As for the dependency, is is a generic querying API for the annotator that I wrote, with a dedicated object model decoupled from the JSON representation. I can integrate this code directly inside the proxy servlet, however it would only make sense if the rest was refactored and decoupled from the JSON as well or if I rewrote this to be couples with the JSON (for the sake of consistency). A mid way solution would be to integrate it as-is now and gradually perform the integration. Let me know which option is preferable from your standpoint.

@jonquet Can you add me as a member of the projet, so that I may directly push a new branch with the changes? If staging tests go well then I will perform the merge with origin/master HEAD.

jonquet commented 8 years ago

Access should be ok now. I had created a team Annotator but forgot to put projects in ;)

Ok for the other point. Indeed if this is an Annotator client then it doesn't not belong here.

twktheainur commented 8 years ago

I have an intermediate solution for the dependency, I can make the annotator a multimodule maven project, with one module for the annotator proxy, one module for the generic annotation model and one module for the java client part. This would have the logistical advantages of manipulating a single project on github, while ensuring a sufficient separation between the three aspects

jonquet commented 7 years ago

What's the status of this right now ? Can we test the BRAT format ? (at least for the ontolog CUI/TUI information)

The population of CUI in the SIFR BioPortal ontologies is discussed here: https://github.com/sifrproject/sifr_project_java_ontology_processing/issues/3

twktheainur commented 7 years ago

The BRAT output works quite well, you can test it by adding format=brat, it should work on the annotator plus deployed on stage portal. I will also deploy it on /annotator on stage so you may get for French.

Note that this is the BRAT format for the QUAERO corpus evaluation and does not include annotations from context. I haven't yet had the chance of implementing a separate BRAT format for general purpose annotation that would include negation yet.

jonquet commented 7 years ago

@twktheainur that shall be closed no?

ontoportal-lirmm / annotators

Get the annotations in the BRAT format #16