This PR adds code to test a few simple domain adaptation methods (CORAL and TCA) on mutation prediction across cancer types. The idea for these methods is to apply an unsupervised DA algorithm to align the train data to the test data, then train our models on the aligned training data and evaluate on the test data.
I can't remember if WENDA transforms the data this way (I think it learns a set of feature weights, but I don't remember the details), but I think you should just be able to call your code with the same train/test data that we're using in CORAL and TCA.
Main code changes:
02_cancer_type_classification/domain_adaptation.ipynb: analysis notebook for DA results
02_cancer_type_classification/run_cancer_type_classification.py: add flags for CORAL and TCA parameters (this is a bit of a mess but it works...)
pancancer_evaluation/utilities/tcga_utilities.py: functions to call domain adaptation code in Python transfertools package
Most other changes are just boilerplate or things copied over from the mpmp repo.
This PR adds code to test a few simple domain adaptation methods (CORAL and TCA) on mutation prediction across cancer types. The idea for these methods is to apply an unsupervised DA algorithm to align the train data to the test data, then train our models on the aligned training data and evaluate on the test data.
I can't remember if WENDA transforms the data this way (I think it learns a set of feature weights, but I don't remember the details), but I think you should just be able to call your code with the same train/test data that we're using in CORAL and TCA.
Main code changes:
02_cancer_type_classification/domain_adaptation.ipynb
: analysis notebook for DA results02_cancer_type_classification/run_cancer_type_classification.py
: add flags for CORAL and TCA parameters (this is a bit of a mess but it works...)pancancer_evaluation/utilities/tcga_utilities.py
: functions to call domain adaptation code in Pythontransfertools
packageMost other changes are just boilerplate or things copied over from the
mpmp
repo.