ncbo / ncbo_rest_sample_code

Sample code that demonstrates the use of the NCBO REST services
http://data.bioontology.org/documentation
41 stars 33 forks source link

Maximum size limited for NCBO annotator service? #3

Closed leej35 closed 8 years ago

leej35 commented 8 years ago

When I submit a very long text query to NCBO annotator service using Python3.5 on urllib3, it gives me this error.

exceptions.MaxRetryError: HTTPConnectionPool(host='data.bioontology.org', port=80): Max retries exceeded with url: /annotator?text=Prevention+and+Early+Detection+of+ ... (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))

When I query short text to annotate, it is processed without any problem but when the size of query text is going to be 20Kbyte, it gives me the error.

Presumably, it seems there is certain amount of maximum length of query allowed to the annotator service. If so, can I know how much exactly it is?

For your information, I attached my python code.

Thanks, Jeongmin

import json
import urllib3
import urllib
import traceback
import sys
import re
import glob
from time import sleep

# user parameters
TEXT_DIR = '../data/text/'
JSON_DIR = '../data/json/'

apikey=''
REST_URL = "http://data.bioontology.org"

ontology_list = 'ICD9CM,LOINC,NDFRT,RXNORM,SNOMEDCT'
tui_list = 'T017,T029,T023'
options = '&longest_only=true&exclude_numbers=false&whole_word_only=true&exclude_synonyms=false' param = '&ontologies=' + ontology_list + '&semantic_types=' + tui_list + options

def get_json(text):
    # create request_url
    request_url = REST_URL + "/annotator?text=" + text.replace(' ','+') + param + "&apikey=" + apikey
    # get data as json type
    http = urllib3.PoolManager() 
    r = http.request('POST', request_url, headers={'Authorization': 'apikey token=' + apikey})
    print('request_url: '+request_url)
    print('http status: '+ str(r.status))
    data_json = json.loads(r.data.decode('utf-8'))
    return data_json

def main():
    for filename in glob.glob(TEXT_DIR+'*.txt'):
        # for each file load file 
        text = ''
        lines = open(filename,"r").read().splitlines()
        for l in lines:
            text = text + l.rstrip()
        # remove special characters
        text = re.sub('[^A-Za-z0-9]+', ' ', text)
        # get json
        data = get_json(text)
        # save to json file
        filename_nodir = filename.split('/')[-1].split('.')[0]
        json_fn = '' + filename_nodir + '.json'
        # print(json_fn)
        with open(JSON_DIR+json_fn, 'w') as outfile:
            json.dump(data, outfile)

if __name__ == "__main__":
    main()
leej35 commented 8 years ago

I think I wrote this in wrong place. I close this. If can, can you delete this article?