For several goals, the classifier performs poorly on academic research projects as shown in this notebook, which uses manually verified data. To improve the classifier, we will create a labelled dataset using an annotation tool. The dataset should:
Be relatively balanced
Contain projects that span a range of likelihoods of belonging to a goal
Use a mixture of keywords and classification probabilities to generate 16 datasets that can be manually labelled with binary classifications.
For several goals, the classifier performs poorly on academic research projects as shown in this notebook, which uses manually verified data. To improve the classifier, we will create a labelled dataset using an annotation tool. The dataset should: