mahmoodlab / UNI

Towards a general-purpose foundation model for computational pathology - Nature Medicine
Other
351 stars 48 forks source link

Dataset Split #46

Open lingxitong opened 1 week ago

lingxitong commented 1 week ago

Nice Work! I have some questions about the dataset split of the downstream task.For the train-test split (e.g., CRC-100K dataset), my understanding is that you train for one epoch on the training set, then test once on the test set, and finally report the best result. Alternatively, within the train split, you further divide it into training and validation sets, then use the validation set to select the best model before testing on the test set.Can you tell me you conduct which one?I will follow your answer to process my dataset the same way.Thanks!

Richarizardd commented 5 days ago

Hi @lingxitong - We simply fit a logistic regression model via L-BFGS on the entire train set using a adaptable cost (w.r.t. to the embedding dimension size). We do not divide into train / validation sets (no hyper-parameter tuning or model selection).