whitead / dmol-book

Deep learning for molecules and materials book
618 stars 121 forks source link

Scaffold Split #152

Open whitead opened 2 years ago

whitead commented 2 years ago

Probably add to regression chapter. Also, rename chapter

kjappelbaum commented 2 years ago

This is a good one. Perhaps a discussion of data leakage and other potential splitting techniques (time-based, leave-one-cluster out, ...) could be interesting.

The MoleculeNet paper has some discussion about this for molecules, we have been looking at this for materials lately.

whitead commented 2 years ago

Thanks @kjappelbaum - yes I think I like your original idea of creating a chapter (#96) on best practices and this would be a good topic for this.