phurwicz / hover

:speedboat: Label data at scale. Fun and precision included.
https://phurwicz.github.io/hover
MIT License
323 stars 19 forks source link

Phase out soft label / denoising components #24

Closed haochuanwei closed 2 years ago

haochuanwei commented 2 years ago

hover itself does not produce soft labels, hence cross_entropy_with_probs is only relevant when we do label smoothing. This can be achieved in torch like described here. It requires torch>=1.10.0.

Co-teaching based stuff in hover.utils.denoising is an over-stretch here with too much background for the vast majority of intended users. It's hard to justify using a specific piece of research in a library like hover with almost no ties to it.

phurwicz commented 2 years ago

Agreed with the denoising point. In the active learning part we are not after SOTA, so we should look for low-hanging accuracy boosts without user config. Label smoothing is an example of this, so let's keep it using the torch>=1.10.0 implementation.

Just to point out a popular tool that can help clean the noise in the labels: https://github.com/cleanlab/cleanlab

phurwicz commented 2 years ago

Completed by 9bf5f48baacb67b725342653228d4caf6cd4f713, fe629e5487a9b4877f926749ac1f4a6d85bf95d1, and ac75af1ca4a4ecc06308768dceabdf8bbf757182.