kulukimak / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Universal phrase tag set (consituents) #268

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This might be something for coarse grained tags on constituents:

https://github.com/aaronlifenghan/A-Universal-Phrase-Tagset

Original issue reported on code.google.com by richard.eckart on 23 Oct 2013 at 5:39

GoogleCodeExporter commented 9 years ago
most of it looks good and will work for German (and maybe other languages - has 
to be investigated) as well;

However, I find the naming of these two tags not well motivated:

CONJP (conjunction phrase), COP (coordinated phrase)

and the mapping they defined does not make sense to me:

PTB CONJP -> CONJP

PTB UCP -> COP (UCP stands for unlike coordination, i.e. conjuncts are not of 
the same syntactic category)
French Treebank COORD -> COP

see slides of the paper: 
http://de.slideshare.net/AaronHanLiFeng/pptphrase-tagset-mapping-for-french-and-
english-treebanks-and-its-application-in-machine-translation-evaluation

background on coordination:
http://faculty.washington.edu/fxia/LAWVI/workshop_presentation_slides/paper_sess
ion5/coordination.pdf

they define coordination as
"A syntactic structure consisting of two or more elements
(conjuncts), with one or more conjuncts typically, but not
always preceded by a coordinating conjunction"

-- Judith

Original comment by eckle.kohler on 23 Oct 2013 at 6:47

GoogleCodeExporter commented 9 years ago
If the idea of universal tags catches on, there'll probably be more work on 
this. We can collect further references here until we feel it is a good time to 
act. In any case, we could also just define our own categories. After all, 
we're already collecting tag set definitions for various tag sets which we 
could compare at some point. However, I don't see us doing real evaluation of 
the kind of impact such categories could have, as e.g. Petrov et al did it for 
the POS tags.

Original comment by richard.eckart on 23 Oct 2013 at 6:51

GoogleCodeExporter commented 9 years ago
Dear Judith,

the address shall be :
http://de.slideshare.net/AaronHanLiFeng/pptphrase-tagset-mapping-for-french-and-
english-treebanks-and-its-application-in-machine-translation-evaluation-41998520

instead of :
http://de.slideshare.net/AaronHanLiFeng/pptphrase-tagset-mapping-for-french-and-
english-treebanks-and-its-application-in-machine-translation-evaluation

Best regards.
NLPer

Original comment by hanlifen...@gmail.com on 25 Nov 2014 at 12:46

GoogleCodeExporter commented 9 years ago
A Universal Phrase Tagset for Multilingual Treebanks
Springer
October 20, 2014
Many syntactic treebanks and parser toolkits are developed in the past twenty 
years, including dependency structure parsers and phrase structure parsers. For 
the phrase structure parsers, they usually utilize different phrase tagsets for 
different languages, which results in an inconvenience when conducting the 
multilingual research. This paper designs a refined universal phrase tagset 
that contains 9 commonly used phrase categories. Furthermore, the mapping 
covers 25 constituent treebanks and 21 languages. The experiments show that the 
universal phrase tagset can generally reduce the costs in the parsing models 
and even improve the parsing accuracy.

In M. Sun et al. (Eds.): CCL and NLP-NABD 2014, LNAI 8801, pp. 247–258, 2014.
© Springer International Publishing Switzerland 2014

http://link.springer.com/chapter/10.1007%2F978-3-319-12277-9_22

Original comment by richard.eckart on 16 Jan 2015 at 9:50