jamie2017 / LearningWithNoisyLabels

Implementation of a state-of-art algorithm from the paper “Learning with Noisy Labels” , which is the first one providing “guarantees for risk minimization under random label noise without any assumption on the true distribution.”
22 stars 10 forks source link

The calculation is wrong #1

Open iamhankai opened 4 years ago

iamhankai commented 4 years ago

https://github.com/jamie2017/LearningWithNoisyLabels/blob/6bb1f673aed4311dbd6f206a8cbe487244d0cf32/src/TrainingModel.py#L82

p_y and py are related to the labels.

randydkx commented 3 years ago

@iamhankai That's what I think too.

evileleven commented 2 weeks ago

I think the variable 'tr_data' in the code represents the noise data (data = np.array(self.noised_train_set))

def selectClfByKFold(self,po1,po2):
        min_Rlf = float('inf')
        target_dataset = None
        data = np.array(self.noised_train_set)
        kf = KFold(n_splits=2)
        for train, test in kf.split(data):
            size = len(train)
            tr_data = data[train]
            p_y = 1.0*sum(1 for d in tr_data if d[0] == -1)/size
            py  = 1.0*sum(1 for d in tr_data if d[0] == 1)/size
            Rlf = []
            for d in tr_data:
                Rlf.append(self.estLossFunction(self.true_data_map[d[1],d[2]],d[0],py,p_y,po1,po2))
            if np.mean(Rlf) < min_Rlf:
                min_Rlf = np.mean(Rlf)
                target_dataset= tr_data
        print("4. Cross-validation finished!")
        return self.trainByNormalSVM(target_dataset)