The split created using Create Split Seq may confuse the Training, Validation, and Test sets, leading to data leakage.

in the current official tumor vs. Normal example provided in the repository, if Slide_13 is used as Test in Splits0 but then used as Training data in Splits1, this will indeed result in data leakage. Having the same data point appear in both the training and test sets across different splits can skew the evaluation results, leading to over-optimistic performance and rendering the evaluation results inaccurate.

To ensure accurate evaluation results, it is critical that no data point is shared between the training, validation, and test sets, even across different splits. If a slide is used for testing in one split, it should not appear in the training or validation sets in any other splits.

mahmoodlab / CLAM

The split created using Create Split Seq may confuse the Training, Validation, and Test sets, leading to data leakage. #269