This PR does two different things - unfortunately they ended up being a bit coupled so it would have been hard to separate into distinct PRs.
Add a patient sex covariate for MSI prediction experiments (this didn't really end up changing anything results-wise, generalization to UCEC is still difficult)
Implement patient sex prediction. The results of this were a bit interesting: we expected really small models to work the best (e.g. just including genes on the X/Y chromosomes). This doesn't really seem to be the case, although performance is good overall even on held-out cancer types.
The best-performing LASSO penalty parameters generally correspond to models having hundreds to thousands of features. I haven't looked yet at whether these features tend to be on the X and Y chromosomes, although I may do that in the future.
This PR does two different things - unfortunately they ended up being a bit coupled so it would have been hard to separate into distinct PRs.
The best-performing LASSO penalty parameters generally correspond to models having hundreds to thousands of features. I haven't looked yet at whether these features tend to be on the X and Y chromosomes, although I may do that in the future.