molML / MoleculeACE

A tool for evaluating the predictive performance on activity cliff compounds of machine learning models
MIT License
164 stars 19 forks source link

Is there any validation split? #5

Closed smiles724 closed 1 year ago

smiles724 commented 1 year ago

Hi, thanks for sharing the code.

However, as far as I am concerned, you only split the data into training and testing, but ignore the validation split.

It is important to have both train and validation, otherwise, you will not have enough knowledge to know when to stop training and what is the best model. I believe cross-validation cannot avoid the drawback of missing the validation split.

Can you please give me a potential answer?

githubXin123 commented 1 year ago

@smiles724 Hello, the author employed early stopping strategy in cross-validation to mitigate overfitting.

smiles724 commented 1 year ago

@smiles724 Hello, the author employed early stopping strategy in cross-validation to mitigate overfitting.

I am afraid an early stop is not the correct way to select the best model. In other words, you should evaluate the training process based on the validation set, which is missing.

githubXin123 commented 1 year ago

@smiles724 First, the author divided the dataset into a training set and a test set. Then, the author applied 5-fold cross-validation to the training set, where in each fold, one-fifth of the data was used as a validation set to evaluate the model.