Closed divilian closed 2 years ago
See the new synthetic
directory, @rpersing. If you run synthesize.py
with no arguments, it will create files in two subdirectories of a data
directory -- one called A_sup_B
and the other called none
. The former contains one file per (tiny) document, in which there is an A-supplies-B relation present, and the latter contains other (tiny) documents. All documents in both groups have exactly two organizations mentioned in them.
Write code to generate a synthetic data set of very simple documents, each 38 words or less, based on a simple set of sentence patterns, with a small set of organizational names to use as key nouns. Make half the set labeled "positive" for an A_supplies_B relation, and the other "negative."