vnadgir / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

PennTreeBank Reader for tagged corpora #439

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
DKPro has yet no reader that can read the tagged plain-text corpora that comes 
along with the PTB.

Points for discussion:
- corpora contain noun phrase annotations (in addition to the tags), is there a 
type to annotate noun phrases in DKPro?

- Tokens have occasionally two or more possible part of speech tags in case of 
ambiguity, how to deal with those. Take only the first one?

- The switchboard corpus in PTB has additionally wrongly tagged words marked, 
how to deal with those. Is there a 'no-tag' attribute value for a UIMA-Pos type

Original issue reported on code.google.com by Tobias.H...@gmail.com on 1 Aug 2014 at 11:12

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2682.

- Copying over missing parameter descriptions from ComponentParameters

Original comment by richard.eckart on 4 Aug 2014 at 3:02