janmbuys / DeepDeepParser

Neural Semantic Graph Parser
Apache License 2.0
30 stars 9 forks source link

where is the original data? #1

Closed SeekPoint closed 6 years ago

janmbuys commented 6 years ago

http://svn.delph-in.net/erg/tags/1214/

goodmami commented 6 years ago

More specifically, the Redwoods Treebank data is in subdirectories under http://svn.delph-in.net/erg/tags/1214/tsdb/gold/. The http://svn.delph-in.net/erg/tags/1214/etc/redwoods.xls file explains the train/dev/test splits, although I don't find it very clear. I spelled out these splits in the following script, which may be useful for you: https://github.com/goodmami/mrs-to-penman/blob/master/convert-redwoods.sh#L4-L187

oepen commented 6 years ago

i would have to check the details of experimental settings in the original buys & blunsom (2017) paper, but i recall jan was experimenting with two different sets of EDSs, one he had converted himself, the other from data that i had prepared and (later on) released.

probably the easiest way for you to get started without having to do data conversion of your own is to use the EDSs included in the Open SDP 1.2 release:

http://sdp.delph-in.net/index.php?page=5

this package provides the official SDP export from the 1214 release of the Redwoods Treebank, some 37,000 EDSs for sections 00–21 of the venerable Wall Street Journal corpus, in three serializations (‘native’ EDS, JSON, and AMR-like). for replicability, i would recommend using this data; the training–development–evaluation splits correspond to the standard Redwoods assumptions: sections 00–19, 20, and 21, respectively.

README.txt