We want to see whether smaller models (i.e. models with fewer nonzero features) tend to generalize to new cancer types better than larger ones. The conventional wisdom (we think) is that smaller/more parsimonious models are more likely to be more robust or to generalize better, and we thought it would be interesting to test this empirically using my leave-one-cancer-out cross-validation setup.
On this very small proof of concept (just TP53 for a few cancer types so far), there doesn't seem to be any correlation between model size (number of nonzero features) and generalization performance:
The next step is to scale up to more genes/cancer types, and hopefully to develop a rigorous way (e.g. a statistical test of some sort) to evaluate this on a larger dataset.
We want to see whether smaller models (i.e. models with fewer nonzero features) tend to generalize to new cancer types better than larger ones. The conventional wisdom (we think) is that smaller/more parsimonious models are more likely to be more robust or to generalize better, and we thought it would be interesting to test this empirically using my leave-one-cancer-out cross-validation setup.
On this very small proof of concept (just TP53 for a few cancer types so far), there doesn't seem to be any correlation between model size (number of nonzero features) and generalization performance:
The next step is to scale up to more genes/cancer types, and hopefully to develop a rigorous way (e.g. a statistical test of some sort) to evaluate this on a larger dataset.