jeisner / treebank-scripts

Suite of scripts for preprocessing the Penn Treebank, primarily to extract lexical subcategorization frames and dependencies.
MIT License
7 stars 1 forks source link

some linguistically odd head choices #5

Open jeisner opened 8 years ago

jeisner commented 8 years ago

[item from the old TO-DO file dated 2002-04-07]

In

(NP (NP (DT the) (NN computer) (NN language) ) (VP (VBN called) (S (NP-SBJ (-NONE- *) ) (NP-PRD (NNP UNIX) ))))

we currently get Unix as the head of an S. Yuck! Many nouns show up as heads of S, in exactly this configuration - the superficial object of"called.

SBAR -> @ WHNP S/NP, e.g., "in which we trusted," is currently headed by "in". Should probably be headed by the WHNP, with "in" marked as moved, as in "which we trusted in."

jeisner commented 8 years ago

[item from the old TO-DO file dated 2002-04-07]

Copular VPs should probably be headed by the predicate part. Think of the copula as a function word feature that turns an adjective or noun into a VP by giving it appropriate inflectional features (not necessarily tense, though). This should help with tough-movement.

jeisner commented 8 years ago

[item from the old TO-DO file dated 2002-04-07]

Munge corpus to let "less" and "than" be sisters, or parent and child, in "less X than Y". (Currently, X is chosen as the head child, so "less" and "than" can't see each other.) Similarly for other paired comparatives.

jeisner commented 8 years ago

[item from the old TO-DO file dated 2002-04-07]