quinlan-lab / pathoscore

pathoscore evaluates variant pathogenicity tools and scores.
MIT License
21 stars 8 forks source link

Script to make gnomad benign truth set (filtered on ExAC) and filter out variants on AA change #4

Closed jimhavrilla closed 6 years ago

jimhavrilla commented 7 years ago

https://github.com/quinlan-lab/regionanalysis/blob/master/parvarfilter.py

Is the filter script. Frequency and genes can also be filtered with https://github.com/quinlan-lab/regionanalysis/blob/master/secondfilter.py

Can use it like:

python parvarfilter.py -x $DATA/clinvar-gnomad.txt -n clinvar -c -s patho -e gnomad -d genescreens/ad_genecards_clean.txt -f

Creates a file called $DATA/clinvar-patho-gnomad.txt ( you have to add back a vcf header, but that's an easy fix ).

python parvarfilter.py -x $DATA/gnomad-exac.txt -n gnomad -s benign -e exac -d genescreens/ad_genecards_clean.txt -f

Creates a set of gnomad benigns called gnomad-benign-exac.txt (gnomad, benign set, filtered on exac). Filters on AA change/allele matching. Also, optionally on AD gene set.

as in: https://github.com/quinlan-lab/regionanalysis/blob/master/pathocompare.sh

brentp commented 7 years ago

that would be great to add more truth sets. if you want to do this, first read this: https://github.com/quinlan-lab/pathoscore#truth-sets

and have a look at make.sh and make.py for clinvar and then open a PR.

jimhavrilla commented 7 years ago

Do I have to use make.py and make.sh? I should be able to modify the code I posted if you want...or do you want me to clean it up and call it make.py?

Jim Havrilla PhD Candidate in Human Genetics, University of Utah Accelerated BS/MS in Biomedical Engineering, Drexel University '12, Concentration: Bioinformatics "Memory, comprehension, communication, motivation"

On Thu, Jul 13, 2017 at 4:43 PM, Brent Pedersen - Bioinformatics < notifications@github.com> wrote:

that would be great to add more truth sets. if you want to do this, first read this: https://github.com/quinlan- lab/pathoscore#truth-sets

and have a look at make.sh and make.py for clinvar and then open a PR.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/quinlan-lab/pathoscore/issues/4#issuecomment-315221689, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQ8BBFezCGaL4nVf61QT6xKhSL2EU_Oks5sNp12gaJpZM4OXgrQ .

brentp commented 7 years ago

create a make.sh that can be run as bash make.sh and the result is the .vcf.gz(s) you'll be adding as a truth set. if you have pathogenics and benigns, there will be 2 files. either "pathogenic" or "benign" should be in the name of the resulting .vcf.gz

jimhavrilla commented 7 years ago

gotcha

Jim Havrilla PhD Candidate in Human Genetics, University of Utah Accelerated BS/MS in Biomedical Engineering, Drexel University '12, Concentration: Bioinformatics "Memory, comprehension, communication, motivation"

On Thu, Jul 13, 2017 at 4:59 PM, Brent Pedersen - Bioinformatics < notifications@github.com> wrote:

create a make.sh that can be run as bash make.sh and the result is the .vcf.gz(s) you'll be adding as a truth set. if you have pathogenics and benigns, there will be 2 files. either "pathogenic" or "benign" should be in the name of the resulting .vcf.gz

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/quinlan-lab/pathoscore/issues/4#issuecomment-315224709, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQ8BKgEuWdjgmGSKROC3vt0Ujd5_GOhks5sNqFqgaJpZM4OXgrQ .

jimhavrilla commented 6 years ago

I think the benign truth set on gnomAD is more or less done at this point with the "benchmark" sets...filtering on AA change sounds like something I could maybe add in the future, but it may have to be through vcfanno for speed's sake. Perhaps we should close this issue?