DKPro has yet no reader that can read the tagged plain-text corpora that comes
along with the PTB.
Points for discussion:
- corpora contain noun phrase annotations (in addition to the tags), is there a
type to annotate noun phrases in DKPro?
- Tokens have occasionally two or more possible part of speech tags in case of
ambiguity, how to deal with those. Take only the first one?
- The switchboard corpus in PTB has additionally wrongly tagged words marked,
how to deal with those. Is there a 'no-tag' attribute value for a UIMA-Pos type
Original issue reported on code.google.com by Tobias.H...@gmail.com on 1 Aug 2014 at 11:12
Original issue reported on code.google.com by
Tobias.H...@gmail.com
on 1 Aug 2014 at 11:12