singnet / rfai-proposal

MIT License
2 stars 0 forks source link

Unsupervised Parsing #4

Open raamb opened 4 years ago

raamb commented 4 years ago

Author Anton Kolonin; Andres Suarez Madrigal

Description Design and implement a syntactic parser that generates unlabelled, undirected dependency parses for a given corpus of sentences. The parser must be unsupervised, meaning that it can only be trained using unannotated text corpora and that it may not contain hard-coded language rules.

Syntactic parsing of text is a crucial component of different Natural Language Processing tasks, as it helps to understand the precise meaning of a given word or component in a sentence. Current state-of-the-art parsers rely either on human-created rules e.g. the Link Grammar Parser or on training on large annotated treebanks e.g. Parsey McParseface, meaning that they require a significant effort from specialized humans to produce. Consequently, only the most popular human languages count with reliable syntactic parsers. This RFAI is part of an ongoing effort to learn a grammar from a corpus of text without any annotations in an unsupervised manner (see this paper and this repository), which would allow for more powerful NLP tools for understanding any language, or even variations of a language (e.g. chatspeak, baby language, etc.).

The goal of the challenge is to produce an unsupervisedly-trained, undirected, unlabeled dependency parser capable of reproducing the evaluation treebanks as close as possible.

Acceptance Criteria

Examples are silly 0 ###LEFT-WALL### 2 are 1 Examples 2 are 2 are 3 silly

Useful information

Related documents

Related videos

Training dataset The training dataset to use is a pre-cleaned version of a collection of books for children in English, obtained from the Gutenberg Project, and hence referred as “Gutenberg Children”: Gutenberg Children Corpus

Evaluation treebanks For the benefit of the participants, we also provide three different treebanks used for evaluation. Note that the requested parser must be unsupervised, so the parses in these treebanks should NOT be used during training. They can, however, be used to guide the design of your parser.

Acceptance Criteria

Metrics

NON-FUNCTIONAL REQUIREMENTS

Expiration Date 20 June 2020