mhguo1 / TRAPD

Burden testing against public controls
MIT License
50 stars 32 forks source link

--pop option with gnomad r.2 data #10

Open jenkelly10 opened 4 years ago

jenkelly10 commented 4 years ago

Hello! Thank you for this wonderful program! I think it will be invaluable for working with rare variant data! I have a quick question... when running the script for counting controls and working with the gnomad.exome.r.2.1.1.vcf data, how do we indicate a subpopulation of the data? For instance, if I want to work only with the control-nfe data or the non-cancer_nfe data rather than all NFE samples? Thanks so much for your time and consideration!

dalmiaa commented 4 years ago

same question, did you find a solution to this?

misrak commented 2 years ago

Hello! Thank you for this wonderful program! I think it will be invaluable for working with rare variant data! I have a quick question... when running the script for counting controls and working with the gnomad.exome.r.2.1.1.vcf data, how do we indicate a subpopulation of the data? For instance, if I want to work only with the control-nfe data or the non-cancer_nfe data rather than all NFE samples? Thanks so much for your time and consideration!

Do you know how to work for only NFE samples? I did not understand how that was achieved from the downloaded gnomAD vcf file?

misrak commented 2 years ago

A plausible solution that I could think of was if the allele is present in a particular position then the [allele number should be greater than zero] controls_NFE_AN > 0 & (controls_NFE_AC >= 0 | controls_NFE_AF >=0). This could subset the VCF file for variants present in that population.

wanlee-yang commented 2 years ago

I found an easy way to indicate the data of subpopulation, if you used the gnomad dada, you should add a parameter '-d gnomad' in Counting carriers in public control cohorts; if you used the ExAc data, you should add a parameter '-d exac' in Counting carriers in public control cohorts. if the parameter did not add, the code just set the input data as generic, which led to the useless of --pop parameter. I hope this was helpful.