Active learning - Githubissues

Active learning is not part of model training/test. It is a technique I used to help collect training data more efficiently. It prioritize the annotation of certain text and is useful when one kind of entities is rare to find in your text. You can check wiki if you are not familiar with it

You first prepare some unannotated text and run model on it. You need to use the model's output for each span as a probability distribution over all possible span types. Then you rank different spans according to one type of class probability e.g. catalyst. Then you got a list where the top parts have high model probability and are likely to be true positive, the bottom parts have low model probability and are likely to be true negative. Therefore we would like to prioritize the annotation of the middle part first.

https://github.com/nsndimt/CatalysisIE/blob/e476bbf4d1faee614f92cd08161ecf38f66baddf/model.py#L155

nsndimt / CatalysisIE

Active learning #1