rajatsen91 / CCIT

Classifier Conditional Independence Test: A CI test that uses a binary classifier (XGBoost) for CI testing
45 stars 10 forks source link

Bootstrap data is not being used #7

Open gaabrielfranco opened 2 years ago

gaabrielfranco commented 2 years ago

In the function XGBOUT2, you have the following code:

num_samp = len(all_samples)
if bootstrap:
    np.random.seed()
    random.seed()
    I = np.random.choice(num_samp, size=num_samp, replace=True)
    samples = all_samples[I, :]
else:
    samples = all_samples
Xtrain, Ytrain, Xtest, Ytest, CI_data = CI_sampler_conditional_kNN(
    all_samples[:, Xcoords],
    all_samples[:, Ycoords],
    all_samples[:, Zcoords],
    train_samp,
    k,
)

You create the variable samples when bootstrap is True, but when you call the CI_sampler_conditional_kNN function, you use the variable all_samples. In my understanding, you should use the variable samples in this case. Am I right?

BTW, this is an excellent paper!

pshvechikov commented 2 years ago

Second that. The next lines also look strange to me: why, depending on the dimension of Xtrain, the code uses the classifier with either custom parameters or default ones?

    if bootstrap:
        np.random.seed()
        random.seed()
        I = np.random.choice(num_samp,size = num_samp, replace = True)
        samples = all_samples[I,:]
    else:
        samples = all_samples
    Xtrain,Ytrain,Xtest,Ytest,CI_data = CI_sampler_conditional_kNN(all_samples[:,Xcoords],all_samples[:,Ycoords], None,train_samp,k)
    s1,s2 = Xtrain.shape
    if s2 >= 4:
        model = xgb.XGBClassifier(nthread=nthread,learning_rate =0.02, n_estimators=bp['n_estimator'], max_depth=bp['max_depth'],min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=bp['colsample_bytree'],objective= 'binary:logistic',scale_pos_weight=1, seed=11)
    else:
        model = xgb.XGBClassifier()