vgcagle / DARC

1 stars 0 forks source link

Create v1 synthetic data set #8

Closed divilian closed 2 years ago

divilian commented 2 years ago

Write code to generate a synthetic data set of very simple documents, each 38 words or less, based on a simple set of sentence patterns, with a small set of organizational names to use as key nouns. Make half the set labeled "positive" for an A_supplies_B relation, and the other "negative."

divilian commented 2 years ago

See the new synthetic directory, @rpersing. If you run synthesize.py with no arguments, it will create files in two subdirectories of a data directory -- one called A_sup_B and the other called none. The former contains one file per (tiny) document, in which there is an A-supplies-B relation present, and the latter contains other (tiny) documents. All documents in both groups have exactly two organizations mentioned in them.