jbjorne / TEES

Turku Event Extraction System
147 stars 44 forks source link

Error processing BioNLP 2013 test set #10

Open ajjimeno opened 11 years ago

ajjimeno commented 11 years ago

We are processing the BioNLP 2013 test set. We are evaluating the GE13 model available from TEES on the official test data to complete our report on the BioNLP2013 shared task. We have found several problems doing so as explained below. We were wondering as well if you had the result on these models on the official test set. We would really appreciate if you could share them. May be they are just the same as your official runs.

The first problem we have found is that the preprocessing step has some problems with the file names, e.g. PMC-2817507-01-1._Introduction.*. I renamed the files. After this, the file name is a number + extension, e.g. 1.txt, 1.a1, 1.json, 2.txt, 2.a1, 2.json, ... After this change the documents were loaded without any problem.

The preprocessing works successfully but then it fails during classification (the file outcome of the preprocessing is available from https://www.dropbox.com/s/hhred2ise9tluoq/BioNLP-ST-2013_GE_test_data_rev1_TEES-preprocessed.xml.gz).

The following call was used with the preprocessed data:

python /usr/share/TEES/v2.1/classify.py -i BioNLP-ST-2013_GE_test_data_rev1_TEES-preprocessed.xml.gz -o BioNLP-ST-2013_GE_test_data_rev1_TEES/GE13 -m GE13

The trace of the error is:

[16:13:09 14/06] Traceback (most recent call last): [16:13:09 14/06] File "/usr/share/TEES/v2.1/classify.py", line 190, in [16:13:09 14/06] preprocessorParams=options.preprocessorParams, bioNLPSTParams=options.bioNLPSTParams) [16:13:09 14/06] File "/usr/share/TEES/v2.1/classify.py", line 78, in classify [16:13:09 14/06] detector.classify(classifyInput, model, output, goldData=goldInput, fromStep=detectorSteps["CLASSIFY"], omitSteps=omitDetectorSteps["CLASSIFY"], workDir=workDir) [16:13:09 14/06] File "/usr/share/TEES/v2.1/Detectors/EventDetector.py", line 339, in classify [16:13:09 14/06] xml = self.edgeDetector.classifyToXML(xml, self.model, None, workOutputTag, goldData=goldData, parse=self.parse) [16:13:09 14/06] File "/usr/share/TEES/v2.1/Detectors/SingleStageDetector.py", line 172, in classifyToXML [16:13:09 14/06] return self.exampleWriter.write(exampleFileName, predictions, data, tag+self.tag+"pred.xml.gz", model.get(self.tag+"ids.classes"), parse, exampleStyle=exampleStyle, structureAnalyzer=self.structureAnalyzer) [16:13:09 14/06] File "/usr/share/TEES/v2.1/ExampleWriters/SentenceExampleWriter.py", line 30, in write [16:13:09 14/06] return self.writeXML(examples, predictions, corpus, outputFile, classSet, parse, tokenization, goldCorpus, exampleStyle=exampleStyle, structureAnalyzer=structureAnalyzer) [16:13:09 14/06] File "/usr/share/TEES/v2.1/ExampleWriters/SentenceExampleWriter.py", line 94, in writeXML [16:13:09 14/06] self.writeXMLSentence(exampleQueue, predictionsByExample, sentenceObject, classSet, classIds, goldSentence=goldSentence, exampleStyle=exampleStyle, structureAnalyzer=structureAnalyzer) # process queue [16:13:09 14/06] File "/usr/share/TEES/v2.1/ExampleWriters/EdgeExampleWriter.py", line 54, in writeXMLSentence [16:13:09 14/06] entityById = self.getEntityByIdMap(sentenceElement) [16:13:09 14/06] File "/usr/share/TEES/v2.1/ExampleWriters/EdgeExampleWriter.py", line 34, in getEntityByIdMap [16:13:09 14/06] assert eId not in entityById, eId [16:13:09 14/06] AssertionError: TEES.d0.s0.e0 [16:13:09 14/06] Counter "Write Examples" did not finish [16:13:09 14/06] Last count: 0/37123 [16:13:09 14/06] Last update: None

Thank you in advance, Antonio