A python 2 package computes 22 important somatic mutation features from tumor MAF (mutation annotation format) data. Subsequently, these 22 features are used to predict binary tumor microsatellite instability (MSI) status using a support vector machine (SVM) classifier.
These instructions will help you install MSIpred on your machine, and build a bioinformatics pipeline for tumor MSI status prediction from tumor MAF files.
python 2 >= 2.7 pandas >= 0.20.3 intervaltree >=2.1.0 sklearn >= 0.19.1
Linux & OS X
Download source code and move to the directory where downloaded source code is located, then simply run
python setup.py install
A toy MAF data containing somatic mutation annotations of three colon tumors (COAD) named as toy.maf, and an reference file named as 'simpleRepeat.txt' annotating loci of simple repeats throughout genome for GRCh38 (Genome Reference Consortium Human Reference 38), which can be obtained from UCSC genome annotation database at http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/simpleRepeat.txt.gz
>>> import MSIpred as mp
>>> toy_maf = mp.Raw_Maf(maf_path='toy.maf')
>>> toy_maf.create_tagged_maf(ref_repeats_file='simpleRepeat.txt',tagged_maf_file = 'tagged_toy.maf')
>>> tagged_toy_maf = mp.Tagged_Maf(tagged_maf_path='tagged_toy.maf')
>>> toy_features = tagged_toy_maf.make_feature_table(exome_size=44)
>>> predicted_MSI = mp.msi_prediction(feature_table=toy_features,svm_model=None)
>>> new_model=mp.svm_training(training_X=toy_features,training_y=[0,1,1])
The returned svm model object, new_model, can then be used for MSI prediction by specifying svm_model argument of mp.msi_prediction function
Chen Wang
This project is licensed under the MIT License, see LICENSE for more information. https://github.com/wangc29/MSIpred/blob/master/LICENSE