Closed attardi closed 3 years ago
Could give me some examples. What do you expect to return if passing a line in which word is tokenized into several pieces?
I am passing it a plain text, either a file or a string. It invokes the tokenizer for the given language and get the output in CoNLL format for reading by the parser.
I enclose a fix to the code. tokenizer.py.txt
@attardi that's seems feasible.
A simple change is needed in order to integrate a tokenizer. In file utils/transform.py, to method CoNLL.transform.init(), add the optional parameter
and then set
and in CoNLL.load(), change it to use it:
You can then pass as reader a nltk tokenizer or a Stanza tokenizer. I use this code to interface tp Stanza:
tokenizer.py.txt