This analysis answers the following questions about the coefficients of mutation prediction models fit to pan-cancer data:
How often are the one-hot encoded cancer types used by the elastic net models (i.e. left as nonzero after fitting the model)?
How often is mutation burden used (left as nonzero) by the models?
Which genes are used by the models? Are these mostly the genes that vary the most (i.e. highest mean absolute deviation), or are some of them less variable genes?
For all of these questions, we segment coefficients by well-performing (statistically better than the negative control with shuffled labels) and poor-performing (statistically the same or worse than the negative control).
This analysis answers the following questions about the coefficients of mutation prediction models fit to pan-cancer data:
For all of these questions, we segment coefficients by well-performing (statistically better than the negative control with shuffled labels) and poor-performing (statistically the same or worse than the negative control).