Open chengkun-wu opened 10 years ago
Hi Chengkun,
Thanks, I've added the fix and committed the changes to the repository. However, while the system now seems to process unicode, please avoid using unicode input if at all possible. All the machine-learning systems, including the parser and the event detection components, have been trained on ASCII text. Therefore, they cannot recognize unicode and might interpret such characters in unexpected ways. For converting your input to ASCII you can use a tool such as https://github.com/spyysalo/unicode2ascii.
Best Regards, Jari
Hi Jari,
Thanks for the reply.
Yes I guess avoid Unicode input is a better solution. As I just ran into other problems like the updated post above.
Chengkun
Hi Jari,
I'm now trying to do some work on expanding TEES. Basically I had an in-house NER tool - PathNER, which detects pathway mentions (please refer to our paper http://www.ncbi.nlm.nih.gov/pubmed/24555844 ). How can I make use of TEES to detect events with both BANNER and PathNER? Do you have any suggestions for the best practice?
Thanks!
Chengkun
Hi Jari,
If a text file with Unicode characters is passed to TEES as the input, you will get the "UnicodeEncodeError: 'ascii' codec can't encode character" exception (tested under python 2.7.3 | 64 bit, Mac Mavericks) , caused by the following code (around line 246) in ElemeTreeUtils.py, which was trying to write the ElementTree to the file.
I had a look into the code, the problem can be solved in the following way
The reason for this can be found at http://stackoverflow.com/questions/10046755/write-xml-utf-8-file-with-utf-8-data-with-elementtree
However, even though this problem is solved, there might be other problems followed on.
For instance,