Open GoogleCodeExporter opened 9 years ago
I have added mapping files for NEGRA grammatical functions and CONLL-2008
dependency labels. Judith has added some suggestions on how these could be
mapped to the coarse-grained grammatical functions used in Uby.
Unfortunately, I didn't find documentation on the grammatical functions used in
the different versions of Tiger. Tiger seems to use a superset of the NEGRA
labels and the set appears to differ between the different versions of Tiger.
Does anybody have links to publications or documentation that specifies the
Tiger labels or is it necessary to extract them directly from the corpus meta
data?
Original comment by richard.eckart
on 14 Aug 2013 at 9:19
I found the following documentation about TIGER:
(source http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)
Annotation Manual:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotat
ion/tiger_scheme-syntax.pdf
Reference paper:
http://link.springer.com/content/pdf/10.1007%2Fs11168-004-7431-3.pdf
Original comment by eckle.kohler
on 14 Aug 2013 at 1:22
Differences between NEGRA and TIGER regarding syntactic (=grammatical)
functions:
TIGER introduces additional syntactic functions:
- OP for
prepositional objects, i.e. prepositional phrases that are arguments
(incontrast to adjuncts) of verbs, nouns or adjectives
- PH and EP for different functions of expletive "es" in German
Original comment by eckle.kohler
on 15 Aug 2013 at 6:11
Original comment by richard.eckart
on 15 Aug 2013 at 9:51
(met with Sandra Kübler)
MappingProvider for dependencies would be
1) definitely controversial, because in contrast to POS, there is much less
aggreement about common dependency types across languages
2)would have to be carefully designed, because (obviously) it entails
information loss. So this would IMHO only make sense in the context of
particular applications where you can demonstrate, that the information loss
does not matter and the generalization is beneficial for the application
Original comment by eckle.kohler
on 8 Sep 2013 at 5:26
regarding 2): it doesn't entail information loss per se, because the original
dependency information is perserved in a feature value. The mapping only
applies for selecting a specialized type instead of the generic "Dependency"
type.
Original comment by richard.eckart
on 8 Sep 2013 at 9:15
the CoNLL-2009 Shared Task:
Syntactic and Semantic Dependencies in Multiple Languages could be useful in
this context.
see:
http://www.aclweb.org/anthology-new/W/W09/W09-1201.pdf
"we have prepared a unified format and data for
several very different lanaguages, as a basis
for possible extensions towards other languages
and unified treatment of syntactic depenndecies
and semantic role labeling across natural lan-
guages;"
have to look into the data, though
Original comment by eckle.kohler
on 13 Sep 2013 at 6:20
I believe they just unified the file format, not the tagset. We have readers
and writers for the file format btw (io.conll).
Original comment by richard.eckart
on 13 Sep 2013 at 6:31
The Swedish Treebank provides an official conversion to the Stanford dependency
types: http://stp.lingfil.uu.se/~nivre/swedish_treebank/
Original comment by richard.eckart
on 15 Sep 2013 at 5:29
Original comment by richard.eckart
on 17 Sep 2013 at 2:40
This looks very interesting:
http://www.ryanmcd.com/papers/treebanksACL2013.pdf
https://code.google.com/p/uni-dep-tb/
Original comment by richard.eckart
on 19 Sep 2013 at 1:34
The harmonized label set in
http://www.ryanmcd.com/papers/treebanksACL2013.pdf
looks good. This label set is based on "the principle that content words take
function words as dependents".
We could use it to create mappings for German and other languages where we have
dependency parsers integrated. The question is, how straightforward it is for
languages other than English to map the existing dependency tagsets to this
uniform label set.
This requires looking into the individual dependency tagsets used in the
different treebanks.
Original comment by eckle.kohler
on 19 Sep 2013 at 2:51
for future reference:
the paper about the Italian Stanford Dependency Treebank
http://medialab.di.unipi.it/downloads/ISDT/MIDT-STD2013_law.pdf
Original comment by eckle.kohler
on 19 Sep 2013 at 4:28
Original comment by richard.eckart
on 26 Mar 2014 at 10:51
Still not all components use the providers. Moving ahead again.
Original comment by richard.eckart
on 12 Nov 2014 at 8:47
Original issue reported on code.google.com by
richard.eckart
on 8 Nov 2012 at 4:42