jeisner / treebank-scripts

Suite of scripts for preprocessing the Penn Treebank, primarily to extract lexical subcategorization frames and dependencies.
MIT License
7 stars 1 forks source link

distinguish optional from obligatory arguments in Collins-style marking #17

Open jeisner opened 8 years ago

jeisner commented 8 years ago

[item from the old TO-DO file dated 2002-04-07]

Mike's argument marking doesn't capture optional vs. obligatory. For example, determiners are treated as non-arguments: thus there's no notion of repeated determiners being bad. But this is unlikely to hurt Mike [Collins], because [he uses a discriminative model and] repeated determiners don't show up in the text. That's why using a smoothed model to actually parse [rather than looking at generative probability] would be a fairer evaluation ... at least for the supervised case.