Dataset - Githubissues

Hi,

The original data for the paper was removed due to a system transfer.

For guidance on dataset creation, you can consider the following steps:

Parse the target-side texts using an off-the-shelf parser, e.g., Berkeley Parser.
Traverse the constituency parsing tree in a layer-wise order to obtain the constituents and their corresponding infilling texts at each tree level.
Construct the syntax contexts and the infilling texts accordingly given the results of step 2 (See paper for details).

We provide a sample dataset in toy_data. Specifically, the "training" directory contains the source texts, target texts and parsing results (step 1), whereas the "training_triplets" directory contains the final sequence-to-sequence syntax-aware data for training (step 3).

yafuly / SyntacticGen

Dataset #1