rvinas / HYFA

Hypergraph Factorisation
MIT License
21 stars 4 forks source link

The difference between testing dataset and validation dataset #6

Closed Gongmian784 closed 10 months ago

Gongmian784 commented 10 months ago

Hi, I found the GTEx bulk RNA-seq donors were divided into three parts (training, validation, and testing donors). I can grasp the purposes of the training and validation subsets in relation to the Hypergraph model's training and accuracy validation respectively, but I cannot fully comprehend the role of the testing dataset. Could anyone elaborate on the specific purpose of the testing dataset and how it differs from the validation dataset? Can I just split the data into training and validation, and treat the validation dataset as the testing dataset?

Thanks in advance! Mian

rvinas commented 10 months ago

Hi Mian, thank you for your interest in our work. We used the test dataset to evaluate the model's performance on data from individuals who were not observed at train time and also not used for hyperparameter optimisation (validation individuals).

Can I just split the data into training and validation, and treat the validation dataset as the testing dataset?

It depends on what is your objective. If you are interested in evaluating the performance of the model on unseen data, then you should use a test dataset. The hyperparameters of the model were chosen to maximize performance on the validation individuals, so validation performance might not be an accurate estimate of the generalisation performance.