wangc29 / MSIpred

Microsatellite instability prediction
MIT License
5 stars 2 forks source link

MSIpred

A python 2 package computes 22 important somatic mutation features from tumor MAF (mutation annotation format) data. Subsequently, these 22 features are used to predict binary tumor microsatellite instability (MSI) status using a support vector machine (SVM) classifier.

Getting Started

These instructions will help you install MSIpred on your machine, and build a bioinformatics pipeline for tumor MSI status prediction from tumor MAF files.

Prerequisites

python 2 >= 2.7 pandas >= 0.20.3 intervaltree >=2.1.0 sklearn >= 0.19.1

Installation

Linux & OS X

Download source code and move to the directory where downloaded source code is located, then simply run

python setup.py install

Examples

Example Data

A toy MAF data containing somatic mutation annotations of three colon tumors (COAD) named as toy.maf, and an reference file named as 'simpleRepeat.txt' annotating loci of simple repeats throughout genome for GRCh38 (Genome Reference Consortium Human Reference 38), which can be obtained from UCSC genome annotation database at http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/simpleRepeat.txt.gz

Build a bioinformatics pipeline for tumor MSI prediction

>>> import MSIpred as mp
>>> toy_maf = mp.Raw_Maf(maf_path='toy.maf')
>>> toy_maf.create_tagged_maf(ref_repeats_file='simpleRepeat.txt',tagged_maf_file = 'tagged_toy.maf')
>>> tagged_toy_maf = mp.Tagged_Maf(tagged_maf_path='tagged_toy.maf')
>>> toy_features = tagged_toy_maf.make_feature_table(exome_size=44)
>>> predicted_MSI = mp.msi_prediction(feature_table=toy_features,svm_model=None)

Chen Wang

License

This project is licensed under the MIT License, see LICENSE for more information. https://github.com/wangc29/MSIpred/blob/master/LICENSE