nasa-petal / PeTaL-labeller

The PeTaL labeler labels journal articles with biomimicry functions.
https://petal-labeller.readthedocs.io/en/latest/
The Unlicense
6 stars 3 forks source link

Try using a tree of multilabel classifiers #81

Open bruffridge opened 3 years ago

bruffridge commented 3 years ago

our current approach with the single-label classifier ensemble ignores the taxonomy information. If instead we set up a tree of multilabel classifiers, we could enforce the taxonomy information. I talked this over with David, and here’s how it might go: a paper is passed to a root classifier, which predicts which Level 1 labels it would belong to (e.g., protect from harm, move). For each level 1 label the root classifier predicts, the paper will be routed to a specialized classifier – there would be a classifier which only takes “move” papers and predicts which Level 2 labels it would belong to (e.g., “move_through/on_solids”), and so on, recursively, until we end up with a set of Level 3 labels. You could think of the whole system as a hierarchy of gates, where a whole slew of papers comes to the main entrance (the root classifier) and gets filtered down into their respective destinations (labels) by increasingly specialized gates.

related to/potentially supersedes #74

bruffridge commented 3 years ago

As currently described, the ensemble of single-label classifiers ignores relationships encoded in the hierarchy. In contrast, another innovation, a tree of multilabel classifiers, would encode the hierarchy explicitly into its own structure in order to enforce the taxonomy information. In this conception, a biology paper is passed to a root classifier, which predicts which Level 1 labels the paper would apply to (e.g., “protect from harm”, “move”). For each Level 1 label that the root classifier predicts, the paper will be routed to a more specialized classifier. For example, there would be a classifier which only considers “move” papers and predicts which Level 2 labels it would belong to (e.g., “move through or on solids”), and so on, recursively, until the process ends with a set of Level 3 labels. Intuitively, the system is a hierarchy of gates, and a whole slew of papers arrives at the main entrance (the root classifier) and gets filtered down into their respective destinations (Level 3 labels) by increasingly specialized gates.