get data for link grammar parsing

We want to get some data that we could use for projective and non-projective link grammar parsing, in multiple languages.

A starting point is to read past work on dependency parsing where the tree restriction is dropped.

I should make a list of past papers. Relevant work that I can think of includes:

2-planar parsing (e.g., Andreas Maletti's work)
Matthias Buch-Kromann's dissertation (I don't remember what he calls his method)
the handwritten link grammar parser of English (may be available for other languages too)
extracting dependencies from the Penn Treebank or other automatically parsed text that has traces. The traces allow us to reconstruct extra dependencies for movement ("what_i did you eat _ei", "the sandwich_i that you ate _ei") and control ("you _ei wanted [_ei to eat it]"), as well as non-projective dependencies for things like extraposition ("I met a man _ei yesterday [who had ...]_i").
extracting dependencies from CCGBank, which may be easier because they have already handled the traces etc. (I could ask Julia Hockenmaier.)

As far as I know, no one has done such things with a transition-based parser before. Also I think our formalism is novel (I gave a similar formalism for link grammar in a 2000 book chapter, but the idea of non-projective arcs is new, and the idea that a word's automaton consumes other words along with their states is also new).

karuiwu / reparse

get data for link grammar parsing #11