The calculation is wrong

iamhankai commented 4 years ago

https://github.com/jamie2017/LearningWithNoisyLabels/blob/6bb1f673aed4311dbd6f206a8cbe487244d0cf32/src/TrainingModel.py#L82

p_y and py are related to the labels.

randydkx commented 3 years ago

@iamhankai That's what I think too.

evileleven commented 2 weeks ago

I think the variable 'tr_data' in the code represents the noise data (data = np.array(self.noised_train_set))

def selectClfByKFold(self,po1,po2):
        min_Rlf = float('inf')
        target_dataset = None
        data = np.array(self.noised_train_set)
        kf = KFold(n_splits=2)
        for train, test in kf.split(data):
            size = len(train)
            tr_data = data[train]
            p_y = 1.0*sum(1 for d in tr_data if d[0] == -1)/size
            py  = 1.0*sum(1 for d in tr_data if d[0] == 1)/size
            Rlf = []
            for d in tr_data:
                Rlf.append(self.estLossFunction(self.true_data_map[d[1],d[2]],d[0],py,p_y,po1,po2))
            if np.mean(Rlf) < min_Rlf:
                min_Rlf = np.mean(Rlf)
                target_dataset= tr_data
        print("4. Cross-validation finished!")
        return self.trainByNormalSVM(target_dataset)

jamie2017 / LearningWithNoisyLabels

The calculation is wrong #1