thulas / dac-label-noise

Label de-noising for deep learning
58 stars 9 forks source link

DAC Models and Unstructured Noise #6

Open geoffreyangus opened 4 years ago

geoffreyangus commented 4 years ago

Hello all,

First of all, congratulations on this work– I found it to be very compelling. I am interested in using the method in order to clean up a dataset in which I believe I have unstructured noise. In the paper, the following is written:

To identify the samples for elimination, we train the DAC, observing the performance of the non-abstaining part of the DAC on a validation set (which we assume to be clean). As mentioned before, this non-abstaining portion of the DAC is simply the DAC with the abstention mass normalized out of the true classes. The result in Lemma 1 assures that learning continues on the true classes even in the presence of abstention. However at the point of best validation error, if there continues to be training error on the non-abstaining portion of the DAC, then this is likely indicative of label noise; it is these samples that are eliminated from the training set for subsequent training using regular cross-entropy loss.

A few questions came to mind when I read this:

  1. How is this different from omitting the high loss samples of a vanilla DNN model? Is this an approach you have compared with the DAC approach?
  2. Once one has computed the loss of the samples in the training set, how does one determine how many samples to eliminate? Looking at Table 1, it would seem that the number of samples eliminated in each entry roughly corresponds to the noise level in each experiment. Did your group choose a threshold such that the number of samples eliminated roughly corresponded with prior knowledge of the level of noise in the dataset?
  3. If (2) is the case, do you have any suggestions as to how I should go about eliminating unstructured noise in my use case? We do not have an estimate for the proportion of noisy samples in the dataset.

Please let me know. I congratulate you for your excellent work on this paper and look forward to hearing from you.