Create annotated dataset - Githubissues

nestauk / sdg-mapping

MIT License

1 stars 0 forks source link

Create annotated dataset #12

Open georgerichardson opened 4 years ago

georgerichardson commented 4 years ago

For several goals, the classifier performs poorly on academic research projects as shown in this notebook, which uses manually verified data. To improve the classifier, we will create a labelled dataset using an annotation tool. The dataset should:

Be relatively balanced
Contain projects that span a range of likelihoods of belonging to a goal
Use a mixture of keywords and classification probabilities to generate 16 datasets that can be manually labelled with binary classifications.