Open GoogleCodeExporter opened 9 years ago
most of it looks good and will work for German (and maybe other languages - has
to be investigated) as well;
However, I find the naming of these two tags not well motivated:
CONJP (conjunction phrase), COP (coordinated phrase)
and the mapping they defined does not make sense to me:
PTB CONJP -> CONJP
PTB UCP -> COP (UCP stands for unlike coordination, i.e. conjuncts are not of
the same syntactic category)
French Treebank COORD -> COP
see slides of the paper:
http://de.slideshare.net/AaronHanLiFeng/pptphrase-tagset-mapping-for-french-and-
english-treebanks-and-its-application-in-machine-translation-evaluation
background on coordination:
http://faculty.washington.edu/fxia/LAWVI/workshop_presentation_slides/paper_sess
ion5/coordination.pdf
they define coordination as
"A syntactic structure consisting of two or more elements
(conjuncts), with one or more conjuncts typically, but not
always preceded by a coordinating conjunction"
-- Judith
Original comment by eckle.kohler
on 23 Oct 2013 at 6:47
If the idea of universal tags catches on, there'll probably be more work on
this. We can collect further references here until we feel it is a good time to
act. In any case, we could also just define our own categories. After all,
we're already collecting tag set definitions for various tag sets which we
could compare at some point. However, I don't see us doing real evaluation of
the kind of impact such categories could have, as e.g. Petrov et al did it for
the POS tags.
Original comment by richard.eckart
on 23 Oct 2013 at 6:51
Dear Judith,
the address shall be :
http://de.slideshare.net/AaronHanLiFeng/pptphrase-tagset-mapping-for-french-and-
english-treebanks-and-its-application-in-machine-translation-evaluation-41998520
instead of :
http://de.slideshare.net/AaronHanLiFeng/pptphrase-tagset-mapping-for-french-and-
english-treebanks-and-its-application-in-machine-translation-evaluation
Best regards.
NLPer
Original comment by hanlifen...@gmail.com
on 25 Nov 2014 at 12:46
A Universal Phrase Tagset for Multilingual Treebanks
Springer
October 20, 2014
Many syntactic treebanks and parser toolkits are developed in the past twenty
years, including dependency structure parsers and phrase structure parsers. For
the phrase structure parsers, they usually utilize different phrase tagsets for
different languages, which results in an inconvenience when conducting the
multilingual research. This paper designs a refined universal phrase tagset
that contains 9 commonly used phrase categories. Furthermore, the mapping
covers 25 constituent treebanks and 21 languages. The experiments show that the
universal phrase tagset can generally reduce the costs in the parsing models
and even improve the parsing accuracy.
In M. Sun et al. (Eds.): CCL and NLP-NABD 2014, LNAI 8801, pp. 247–258, 2014.
© Springer International Publishing Switzerland 2014
http://link.springer.com/chapter/10.1007%2F978-3-319-12277-9_22
Original comment by richard.eckart
on 16 Jan 2015 at 9:50
Original issue reported on code.google.com by
richard.eckart
on 23 Oct 2013 at 5:39