tabular data/ noisy instances /new data

xingruiyu / coteaching_plus

ICML'19 How does Disagreement Help Generalization against Label Corruption?

80 stars 16 forks source link

Open nazaretl opened 2 years ago

nazaretl commented 2 years ago

Hi, thanks for sharing your implementation. I have some questions about it:

Does it also work on tabular data?
Is the code tailored to the datasets used in the paper or can one apply it to any data?
Is it possible to identify the noisy instances (return the noisy IDs or the clean set)?

Thanks!

H-Jamieu commented 1 month ago

I am not the authors, so the answer is from my own understanding and may not be true.

Possibile after mofification.
According to my understanding, the loss.py is somehow appliable to any data whose loss function is CE
In this method, to filiter out noisy id is no difference with using small loss trick. Just rank the loss and label the bottom ones (e.g. last 5%) as possibile noisy. You can aggrate the noisy candidate over epoches and analyse which ones are frequent large loss samples.