Open iamhankai opened 4 years ago
@iamhankai That's what I think too.
I think the variable 'tr_data' in the code represents the noise data (data = np.array(self.noised_train_set))
def selectClfByKFold(self,po1,po2):
min_Rlf = float('inf')
target_dataset = None
data = np.array(self.noised_train_set)
kf = KFold(n_splits=2)
for train, test in kf.split(data):
size = len(train)
tr_data = data[train]
p_y = 1.0*sum(1 for d in tr_data if d[0] == -1)/size
py = 1.0*sum(1 for d in tr_data if d[0] == 1)/size
Rlf = []
for d in tr_data:
Rlf.append(self.estLossFunction(self.true_data_map[d[1],d[2]],d[0],py,p_y,po1,po2))
if np.mean(Rlf) < min_Rlf:
min_Rlf = np.mean(Rlf)
target_dataset= tr_data
print("4. Cross-validation finished!")
return self.trainByNormalSVM(target_dataset)
https://github.com/jamie2017/LearningWithNoisyLabels/blob/6bb1f673aed4311dbd6f206a8cbe487244d0cf32/src/TrainingModel.py#L82
p_y and py are related to the labels.