Using some kind of clustering algorithm to predict a class per document. Classes may be genre, topic, usefulness, etc. Finding the closest cluster per document relies on a distance metric.
Objectives
Implement different clustering algorithms to classify documents into an arbitrary set of classes. Text similarity would be a good starting point as the distance metric utilized.
Use zero-shot learning (ZSL) to classify documents from a group of pre-determined classes. HuggingFace has a pipeline for that. Checkout the comments in here.
Description
Using some kind of clustering algorithm to predict a class per document. Classes may be genre, topic, usefulness, etc. Finding the closest cluster per document relies on a distance metric.
Objectives