Closed arwhirang closed 7 years ago
The TEES preprocessor can be used to prepare any data in TXT or Interaction XML format for use with the TEES training or classification programs.
Thank you for the fast response. Though this is the end of this issue, I will post the result of the preprocessor.
I have tried the preprocessor, but it seems that the preprocessor is not fully adjusted for the DDI format xml.
For the test try, I tried to pre-process an xml file from the original DDI'11 data, "Abciximab_ddi.xml". But I faced errors which I found a little fixes will lead me nowhere.
(For example, I fixed some errors and encountered this line, in the file GeniaSentenceSplitter.py
sentenceOffset = Range.charOffsetToSingleTuple(sentence.get("charOffset"))
The sentence object do not have the attribute "charOffset" ... )
I would really like to try the TEES, but the preprocessing is not very welcoming. What do you suggest for me to deal with this case?
The preprocessor input must be either TXT or Interaction XML files. The DDIExtraction Shared Task format (such as the file "Abciximab_ddi.xml") is related to Interaction XML, but is not exactly the same file format.
In order to use the preprocessor, you must convert your data into either TXT files or into Interaction XML files. For more documentation on Interaction XML please see https://github.com/jbjorne/TEES/wiki/Interaction-XML. You can also look at the corpus files installed by TEES (by default these can be found at ~/.tees/corpora) for more examples of Interaction XML files.
Oh, thank you. My mistake on not seeing the Interaction-XML wiki.
Hello, I found that NLM corpus is pretty much same as the DDI corpus.
I tried to apply the TEES code to the data, but the conversion process requires some pre-processed data which is available for download only (in the case of DDI).
Is there any way to apply TEES preprocessing to a new dataset?