Closed arandomgoodguy closed 1 year ago
Hi,
Thank you for your interest! I apologize for my late response.
However, now it seems that the model can only early stop by the loss
Correct!
so I want to know how to modify the code so that I can use custom metric prauc (auc of precision recall curve).
You have to adjust the test_nn function and early stopping checking. Here is an example for the sklearn.metrics.roc_auc_score
def test_nn(model, device, test_loader, criterion, task='class'):
model.eval()
test_loss = 0
correct = 0
all_outputs, all_targets = [], []
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device).float(), target.to(device).long()
output = model(data)
if task == 'class':
test_loss += criterion(output, target).item() # sum up batch loss
all_outputs.append(output[:,1].detach().cpu().numpy())
else:
test_loss += criterion(output.reshape(-1), target.float().reshape(-1)).item() # sum up batch loss
all_outputs.append(output.detach().cpu().numpy())
all_targets.append(target.detach().cpu().numpy())
test_loss /= len(test_loader.dataset)
test_loss *= 100
all_outputs, all_targets = np.array(all_outputs), np.array(all_targets)
custom_metric = roc_auc_score(all_targets.reshape(-1,1), all_outputs.reshape(-1,1))
return test_loss, custom_metric
Important, for the early stopping function we need to minimize the score, therefore if you use ROCAUC, please add it with a negative sign.
test_loss, custom_metric = test_nn(model, 'cuda', test_loader, criterion, task )
#print(test_loss)
early_stopping(test_loss, model)
#early_stopping(-custom_metric, model)
if early_stopping.early_stop:
print("Early stopping")
print('LOSS:', test_loss)
break
Hope it helps! The code is already on github.
Thanks for your response!
I do try the code and now it works just fine.
But there is one more thing that I want to achieve, that is, use a parameter like "eval_set" to calculate the eval_metric on predefined validation data.
Something like this.
model.fit(X_train, y_train,
eval_set = (X_valid, y_valid))
Currently, I think you split the data under "pytorch_train_ann" function.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05,
#stratify=stratify
)
I do try to modify some part of your code regarding splitting by incorporate X_valid, y_valid variable directly, this includes the "fit XGBClassifier part", "fit TreeDrivenEncoder part", and "pytorch_train_ann function split part".
However, this strange error occurred when calling model.fit().
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype) 119 # for object dtype data, we only check for NaNs (GH-13254) 120 elif X.dtype == np.dtype("object") and not allow_nan: --> 121 if _object_dtype_isnan(X).any(): 122 raise ValueError("Input contains NaN") 123
AttributeError: 'bool' object has no attribute 'any'
The weird thing is that, even when I went back and gave up modifying code regarding splitting, I could also encounter this error during fitting if I change the batchsize to like 64 when initializing model.
I think it may be something related to the validation data size but not sure why it happens.
Finally, I am very appreciated for you help!
Thank you for your feedback!
I implemented the support of the custom evaluational dataset and pushed it to the repo, please check the code.
To add a custom eval dataset, please do:
dtlf_model = DeepTFL(n_est=23, max_depth=3, drop=0.23, n_layers=4, task='class', batchsize=320)
dtlf_model.fit(X_train, y_train, X_val=X_test, y_val=y_test)
If you have any other issues, feel free to ping me again!
Well, it seems that you missed a "X_train" at
enc_X_train = self.TDE_encoder.transform(X)
and "y_train" at
self.nn_model = pytorch_train_ann(enc_X_train, y, X_val, y_val, self.shape
But still, even solving all of things above. I still encountered this very strange error and didn't have any idea.
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype) 119 # for object dtype data, we only check for NaNs (GH-13254) 120 elif X.dtype == np.dtype("object") and not allow_nan: --> 121 if _object_dtype_isnan(X).any(): 122 raise ValueError("Input contains NaN") 123
AttributeError: 'bool' object has no attribute 'any'
Did you actually run the code after you modify them? Or just I keep running into this problem?
Hi,
I apologize for the late response, I fixed the code, should work now.
Thank you for your feedback!
Alright, I have tested the code. It finally works, thanks you.
Hi, very interesting work! I had tested the model on a binary classification task and felt somewhat comfortable with it.
However, now it seems that the model can only early stop by the loss,
so I want to know how to modify the code so that I can use custom metric prauc (auc of precision recall curve).
Just like you do the fit of keras, that's say I want it to early stop when the prauc score can not be higher.
Can you give me some advice where to modify or any plan supporting it?
I have tried to modify by using PrecisionRecallCurve from torchmetrics, and auc from sklearn.metrics to calculate prauc,
but the score keeps early stopping by decreasing.
Thanks you!