What happened was that you run tokenizer, tagger and chunker because of PREPROCESSOR, and then you run them again because of TOKENIZER,TAGGER,CHUNKER. Downstream components, starting with the chunker, will break on the missing pos attributes on some of the lex tags (well, half of them).
Solution: use PREPROCESSOR or TOKENIZER,TAGGER,CHUNKER
Also, this should be made clear in the documentation.
And perhaps Tarsqi should be a bit smarter and check the input over which it runs.
Say you invoke Tarsqi as follows:
You now get a funcky duplication of lex tags where one has just the token information and the other the pos as well.
What happened was that you run tokenizer, tagger and chunker because of
PREPROCESSOR
, and then you run them again because ofTOKENIZER,TAGGER,CHUNKER
. Downstream components, starting with the chunker, will break on the missing pos attributes on some of the lex tags (well, half of them).Solution: use PREPROCESSOR or TOKENIZER,TAGGER,CHUNKER
Also, this should be made clear in the documentation.
And perhaps Tarsqi should be a bit smarter and check the input over which it runs.