kjappelbaum / mofdscribe

An ecosystem for digital reticular chemistry
https://mofdscribe.readthedocs.io/en/latest/
MIT License
43 stars 7 forks source link

implement helper for y-scrambling #240

Open kjappelbaum opened 2 years ago

kjappelbaum commented 2 years ago

https://pubs.acs.org/doi/10.1021/ci700157b

https://onlinelibrary.wiley.com/doi/epdf/10.1002/qsar.200390007

the most prominent use I know https://www.science.org/doi/10.1126/science.aat8603

some discussion about the usefulness here https://stat.ethz.ch/pipermail/r-help/2010-March/230856.html, in particular

That sounds like a particular form of permutation test. If the "scrambling" is replaced by sampling with replacement (i.e., some data points can be sampled more than once while others can be left out), that's the simple (or nonparametric) bootstrap. The goal is to generate the distribution of the statistic of interest (R^2 or q^2) under the null hypothesis that there's no relationship between the activity (or property) and the structure. To make the "test" valid, one needs to ensure that the entire model building process is carried through for all of the sampled data, including feature selections, etc.

Which makes sense to me

kjappelbaum commented 2 years ago

Perhaps there is no, need as there's already https://github.com/omixlab/y-scramble.

However, they scramble only the train set (?) and leave the test set intact

kjappelbaum commented 2 years ago

Perhaps this would be an interesting benchmark #244

kjappelbaum commented 2 years ago

more details here https://www.jmlr.org/papers/volume11/ojala10a/ojala10a.pdf

kjappelbaum commented 2 years ago

I think a valid implementation would be the following:

  1. Bootstrap resample $n$ times a. Train model on X, y. Measure some score. b. Shuffle y. Train model. Measure some score. c. Compute the mean/median/... difference d. Append to array

In this way, we have a bootstrapped effect size.

kjappelbaum commented 2 years ago

probably best if we make a dedicated small package for this

kjappelbaum commented 2 years ago

giving it a shot here https://github.com/kjappelbaum/yscrambler