mvPPT website is at: http://www.mvppt.club/
or you can get scores at google drive.
A comprehensive prediction tool, mvPPT (Pathogenicity Prediction Tool for missense variants).
Three training sets based on different combinations of variants from ClinVar, HGMD, Uniprot, and Genome Aggregation Database (gnomAD).
Variants were annotated by the ANNOVAR.
We annotated datasets with ANNOVAR using dbNSFP (v.4.1a, see URLs) to generate some of the required prediction scores from different component tools, including Interpro domain, MutationAssessor, phyloP, GERP, phastCons, PROVEAN, and SiPhy. Mutations located in the interpro domains were recorded as 1 and the rest were recorded as 0. AFs, GFs, and AAFs of each variant in different populations were obtained from the gnomAD exomes database. AFs, AAFs, HomFs, and HetFs were assigned 0 and WtFs were assigned 1 if the variant was not present in the database. The GeVIR, VIRLoF, oe mis upper, HIP, and CCR scores were downloaded from their respective websites (see URLs). One-hot encoding has been applied to amino acid sequence, representing each amino acid with a binary vector of length 20 with a single non-zero value. All the features were selected to provide complementary information, and they either did not require training or their training data are publicly available to allow exclusion from our data.
The MVP, REVEL, PrimateAI, FATHMM-XF, ClinPred, MetaSVM/MetaLR, PolyPhen2, and VEST4 scores were obtained from dbNSFP v4.1a. The M-CAP (version 1.4), MISTIC , CAPICE ReVe, and CADD (version 1.6) scores were downloaded from their respective websites.
mvPPT was trained using the python package LightGBM (version 2.3.1), and
parameters were tuned by Bayesian optimization(version 1.2.0). The
random status was set as 1
throughout the model training process.
The environments of mvPPT built in our study:
conda create -n mvppt python==3.7
conda activate mvppt
conda install --file=requirements_conda.txt
annotation
annovar
annodb
because of the data sizeannodb
ensGene
and dbnsfp35a
by annovargnomad211exoms_allpop
and p6b
, you have to get it by yourself from gnomad_v211
and dbnsfp41a
gnomad_AAF.txt.gz
cat xaa xab > gnomad_AAF.txt.gz
gunzip gnomad_AAF.txt.gz
bash src/anno.sh filename.vcf
python3 src/annoseq.py filenameGeneUniqAnno.txt filenameTotalGeneUniqAnnoEnsSeq.txt
predict
python3 src/predict.py filenameTotalGeneUniqAnnoEnsSeq.txt