preprocessing problem - Githubissues

nlpcl-lab / ace2005-preprocessing

ACE 2005 corpus preprocessing for Event Extraction task

MIT License

291 stars 72 forks source link

preprocessing problem #1

Closed shiqing1234 closed 5 years ago

shiqing1234 commented 5 years ago

StanfordCore Exception Expecting value: line 1 column 1 (char 0) item["sentence"] : [ applause ] it is important for you all to understand and for our fellow americans to understand the tax relief that i have proposed and will push for until enacted would create -- [ applause ] will create 1.4 million new jobs by the end of 200 in two years time, this nation has experienced war, a recession and a national emergency. nlp_text : CoreNLP request timed out. Your document may be too long.

did you meet this problem?how can i solve it?

bowbowbow commented 5 years ago

@shiqing1234 I also experienced the same problem and had ignored several sentences that the error occurred for processing time.

If you want to include all the sentences, how about increasing the timeout parameter value of StanfordCoreNLP? It is currently set to 30 seconds as shown below.

// main.py, 158th line
StanfordCoreNLP('./stanford-corenlp-full-2018-10-05', memory='8g', timeout=30000) as nlp

shiqing1234 commented 5 years ago

@shiqing1234 I also experienced the same problem and had ignored several sentences that the error occurred for processing time.

If you want to include all the sentences, how about increasing the timeout parameter value of StanfordCoreNLP? It is currently set to 30 seconds as shown below.
// main.py, 158th line
StanfordCoreNLP('./stanford-corenlp-full-2018-10-05', memory='8g', timeout=30000) as nlp
nlp_text = nlp.annotate(sentence, properties={'timeout': '990000','annotators': 'tokenize,ssplit,pos,lemma,parse'}) yes，it works,thank you very much！

orans3 commented 3 years ago

I also met the same problem. the reason is when a string contains a % character, the Stanford CoreNlp pipeline is not able to annotate it with named entities. I replace % with # and the problem solved.