greenelab / pancancer-evaluation

Evaluating genome-wide prediction of driver mutations using pan-cancer data
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

LASSO penalty range/model size experiments #64

Closed jjc2718 closed 1 year ago

jjc2718 commented 1 year ago

We want to see whether smaller models (i.e. models with fewer nonzero features) tend to generalize to new cancer types better than larger ones. The conventional wisdom (we think) is that smaller/more parsimonious models are more likely to be more robust or to generalize better, and we thought it would be interesting to test this empirically using my leave-one-cancer-out cross-validation setup.

On this very small proof of concept (just TP53 for a few cancer types so far), there doesn't seem to be any correlation between model size (number of nonzero features) and generalization performance:

image

The next step is to scale up to more genes/cancer types, and hopefully to develop a rigorous way (e.g. a statistical test of some sort) to evaluate this on a larger dataset.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB