Closed jimhavrilla closed 6 years ago
that would be great to add more truth sets. if you want to do this, first read this: https://github.com/quinlan-lab/pathoscore#truth-sets
and have a look at make.sh and make.py for clinvar and then open a PR.
Do I have to use make.py and make.sh? I should be able to modify the code I posted if you want...or do you want me to clean it up and call it make.py?
Jim Havrilla PhD Candidate in Human Genetics, University of Utah Accelerated BS/MS in Biomedical Engineering, Drexel University '12, Concentration: Bioinformatics "Memory, comprehension, communication, motivation"
On Thu, Jul 13, 2017 at 4:43 PM, Brent Pedersen - Bioinformatics < notifications@github.com> wrote:
that would be great to add more truth sets. if you want to do this, first read this: https://github.com/quinlan- lab/pathoscore#truth-sets
and have a look at make.sh and make.py for clinvar and then open a PR.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/quinlan-lab/pathoscore/issues/4#issuecomment-315221689, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQ8BBFezCGaL4nVf61QT6xKhSL2EU_Oks5sNp12gaJpZM4OXgrQ .
create a make.sh
that can be run as bash make.sh
and the result is the .vcf.gz(s) you'll be adding as a truth set. if you have pathogenics and benigns, there will be 2 files. either "pathogenic" or "benign" should be in the name of the resulting .vcf.gz
gotcha
Jim Havrilla PhD Candidate in Human Genetics, University of Utah Accelerated BS/MS in Biomedical Engineering, Drexel University '12, Concentration: Bioinformatics "Memory, comprehension, communication, motivation"
On Thu, Jul 13, 2017 at 4:59 PM, Brent Pedersen - Bioinformatics < notifications@github.com> wrote:
create a make.sh that can be run as bash make.sh and the result is the .vcf.gz(s) you'll be adding as a truth set. if you have pathogenics and benigns, there will be 2 files. either "pathogenic" or "benign" should be in the name of the resulting .vcf.gz
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/quinlan-lab/pathoscore/issues/4#issuecomment-315224709, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQ8BKgEuWdjgmGSKROC3vt0Ujd5_GOhks5sNqFqgaJpZM4OXgrQ .
I think the benign truth set on gnomAD is more or less done at this point with the "benchmark" sets...filtering on AA change sounds like something I could maybe add in the future, but it may have to be through vcfanno for speed's sake. Perhaps we should close this issue?
https://github.com/quinlan-lab/regionanalysis/blob/master/parvarfilter.py
Is the filter script. Frequency and genes can also be filtered with https://github.com/quinlan-lab/regionanalysis/blob/master/secondfilter.py
Can use it like:
python parvarfilter.py -x $DATA/clinvar-gnomad.txt -n clinvar -c -s patho -e gnomad -d genescreens/ad_genecards_clean.txt -f
Creates a file called $DATA/clinvar-patho-gnomad.txt ( you have to add back a vcf header, but that's an easy fix ).
python parvarfilter.py -x $DATA/gnomad-exac.txt -n gnomad -s benign -e exac -d genescreens/ad_genecards_clean.txt -f
Creates a set of gnomad benigns called gnomad-benign-exac.txt (gnomad, benign set, filtered on exac). Filters on AA change/allele matching. Also, optionally on AD gene set.
as in: https://github.com/quinlan-lab/regionanalysis/blob/master/pathocompare.sh