Open whitead opened 2 years ago
This is a good one. Perhaps a discussion of data leakage and other potential splitting techniques (time-based, leave-one-cluster out, ...) could be interesting.
The MoleculeNet paper has some discussion about this for molecules, we have been looking at this for materials lately.
Thanks @kjappelbaum - yes I think I like your original idea of creating a chapter (#96) on best practices and this would be a good topic for this.
Probably add to regression chapter. Also, rename chapter