It occurs to me that most common punctuation marks (commas, periods) are written orthographically with the previous word. It is less intuitive to put them in the subsequent-word node, and the tree visualization looks strange:
Nodes with punctuation are:
(V :p "(" :t "microwaved" :l "microwave" :xpos "VBN")
(Adj :p "?" :p ")" :p "," :t "heartless")
(Adj :p "," :t "tiny")
(N :t "dogs" :l "dog" :p "...")
Suggested new policy: all :p annotations will group with the preceding word, except (a) punctuation tokens containing "(", "[", and any series of punctuations following one of those, and (b) sentence-initial punctuation.
Under the new policy, the above sentence would have
Policy in original CGELBank:
It occurs to me that most common punctuation marks (commas, periods) are written orthographically with the previous word. It is less intuitive to put them in the subsequent-word node, and the tree visualization looks strange:
Nodes with punctuation are:
(V :p "(" :t "microwaved" :l "microwave" :xpos "VBN")
(Adj :p "?" :p ")" :p "," :t "heartless")
(Adj :p "," :t "tiny")
(N :t "dogs" :l "dog" :p "...")
Suggested new policy: all
:p
annotations will group with the preceding word, except (a) punctuation tokens containing "(", "[", and any series of punctuations following one of those, and (b) sentence-initial punctuation.Under the new policy, the above sentence would have
(V :p "(" :t "microwaved" :l "microwave" :xpos "VBN" :p "?" :p ")" :p ",")
(Adj :t "heartless")
(N :t "salsa" :p ",")
(Adj :t "tiny")
(N :t "dogs" :l "dog" :p "...")
Logically quotes could be treated like open parens/brackets, but because
"
and'
are ambiguous, maybe we shouldn't go there.Thoughts? @BrettRey @bwaldon