ncbo / ncbo_annotator

To automatically process a piece of data text to annotate it with relevant ontology concepts and return the annotations.
http://bioportal.bioontology.org/annotator
Other
18 stars 9 forks source link

Issues with some special characters in annotatorplus api #31

Open dilshans2k opened 5 months ago

dilshans2k commented 5 months ago

Request:

encoded_text = quote_plus(text)
apikey = ""
ontologies_to_search = [
    "MONDO"
    ]
format = "json"
params: Dict[str, Any] = {
    "apikey": apikey,
    "format": format,
    "ontologies": ontologies_to_search,
    "mappings": True,
    "longest_only": True,
    "exclude_synonyms": False,
    "expand_class_hierarchy": False,
    "class_hierarchy_max_level": 0,
    "text": encoded_text
}
url = "http://services.data.bioontology.org/annotatorplus"
url = url + f"?apikey={apikey}&format={format}&ontologies={ontologies_to_search[0]}&mappings={True}&longest_only={True}&exclude_synonyms={False}&class_hierarchy_max_level={0}&text={text}"
r = requests.get(url=url)
r.raise_for_status()

Issue with %

If the input text contains % (note the whitespace), API gives 500 internal server error.

Sample input: text=Parkinson Disease % Pneumonia

Server response:

<body>
    <h1>HTTP Status 500 – Internal Server Error</h1>
    <hr class="line" />
    <p><b>Type</b> Exception Report</p>
    <p><b>Message</b> Unexpected end of input at 1:1</p>
    <p><b>Description</b> The server encountered an unexpected condition that prevented it from fulfilling the request.
    </p>
    <p><b>Exception</b></p>
    <pre>com.eclipsesource.json.ParseException: Unexpected end of input at 1:1
    com.eclipsesource.json.JsonParser.error(JsonParser.java:490)
    com.eclipsesource.json.JsonParser.expected(JsonParser.java:484)
    com.eclipsesource.json.JsonParser.readValue(JsonParser.java:193)
    com.eclipsesource.json.JsonParser.parse(JsonParser.java:152)
    com.eclipsesource.json.JsonParser.parse(JsonParser.java:91)
    com.eclipsesource.json.Json.parse(Json.java:295)
    org.sifrproject.annotations.input.BioPortalJSONAnnotationParser.parseAnnotations(BioPortalJSONAnnotationParser.java:65)
    org.sifrproject.servlet.AnnotatorServlet.doPost(AnnotatorServlet.java:177)
    org.sifrproject.servlet.AnnotatorServlet.doGet(AnnotatorServlet.java:118)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:655)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:764)
    org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
    org.sifrproject.util.CharacterSetFilter.doFilter(CharacterSetFilter.java:24)
</pre>
    <p><b>Note</b> The full stack trace of the root cause is available in the server logs.</p>
    <hr class="line" />
    <h3>Apache Tomcat/9.0.62</h3>
</body>

Issue with ;

  1. If the text prefix contains ;, API gives 200OK but with error

    Sample input: text: ;Disease

    Sample output:

    [
        {
            "error": "{"errors":["A text to be annotated must be supplied using the argument text=<text to be annotated>"],"status":400}
    "}]
  2. If the text contains ;, only entities before ; are annotated. Sample input1: text: PARKINSON DISEASE PARKINSON's DISEASE

Sample output1:

[
    {
        "annotatedClass": {
            "definition": [
                "A progressive degenerative disorder of the central nervous system characterized by loss of dopamine producing neurons in the substantia nigra and the presence of Lewy bodies in the substantia nigra and locus coeruleus. Signs and symptoms include tremor which is most pronounced during rest, muscle rigidity, slowing of the voluntary movements, a tendency to fall back, and a mask-like facial expression."
            ],
            "prefLabel": "Parkinson disease",
            "synonym": [
                "paralysis agitans",
                "Parkinson disease",
                "Parkinson's disease"
            ],
..........................
        "hierarchy": [],
        "annotations": [
            {
                "from": 1,
                "to": 17,
                "matchType": "PREF",
                "text": "PARKINSON DISEASE"
            },
            {
                "from": 19,
                "to": 37,
                "matchType": "SYN",
                "text": "PARKINSON'S DISEASE"
            }
        ],
        "mappings": []
    }
]

Sample input2: text = PARKINSON DISEASE; PARKINSON's DISEASE

Sample output2:

    [
        {
            "annotatedClass": {
                "definition": [
                    "A progressive degenerative disorder of the central nervous system characterized by loss of dopamine producing neurons in the substantia nigra and the presence of Lewy bodies in the substantia nigra and locus coeruleus. Signs and symptoms include tremor which is most pronounced during rest, muscle rigidity, slowing of the voluntary movements, a tendency to fall back, and a mask-like facial expression."
                ],
                "prefLabel": "Parkinson disease",
                "synonym": [
                    "paralysis agitans",
                    "Parkinson disease",
                    "Parkinson's disease"
                ],
    ............................
            "annotations": [
                {
                    "from": 1,
                    "to": 17,
                    "matchType": "PREF",
                    "text": "PARKINSON DISEASE"
                }
            ],
            "mappings": []
        }
    ]

As it is visible, Only the first instance of PARKINSON DISEASE was annotated.

syphax-bouazzouni commented 5 months ago

Hello @dilshans2k,

Thank you for the detailed report; We are not right now doing any development on the annotatorplus project, but we will make sure to fix it, in future iterations.

A temporary fix, is to remove special characters from the submitted text, using a regex like this [^\w\s].

As reference here are related issues https://github.com/ontoportal-lirmm/annotators/issues/49, https://github.com/ontoportal-lirmm/bioportal_web_ui/issues/558, and the temporary fix that we did at AgroPortal to remove special characters from the submitted text https://github.com/ontoportal-lirmm/bioportal_web_ui/pull/561

FYI @jonquet, @Bilelkihal

dilshans2k commented 4 months ago

Thanks for the prompt reply. Yes, the solution provided is one way to get it working.

I was wondering, is the annotatorplus repo open source? Also if the database or the kg is publicly available?

syphax-bouazzouni commented 4 months ago

Hello,

Yes, the annotatorplus repo is open-source, feel free to propose any contribution here https://github.com/ontoportal-lirmm/annotators.

The KG is not publicly available, as Biooportal doesn't offer a SPARQL endpoint, but you can use the API https://data.bioontology.org/ to access all the public ontologies and terms.