hakyimlab / summary-gwas-imputation

harmonization, liftover, and imputation of summary statistics from GWAS
MIT License
32 stars 20 forks source link

Losing almost all variants after restricting to reference step in harmonization. #19

Closed hsmith9002 closed 1 year ago

hsmith9002 commented 1 year ago

Hi,

I am using your harmonization script, and it runs without error, but I am noticing that it is filtering out almost all of the variants in my GWAS summary stats. I'm starting with ~45M and after the "restricting to reference step" and ending up with ~400K (<1% of original data). The data is hg38, so I don't lose any variants in the liftover step.

This is the code I am using

-gwas_file step2_Ischemic_Heart_Disease.txt.gz \
-liftover hg19ToHg38.over.chain.gz \
-snp_reference_metadata variant_metadata.txt.gz METADATA \
-output_column_map ID variant_id \
-output_column_map ALLELE0 non_effect_allele \
-output_column_map ALLELE1 effect_allele \
-output_column_map BETA effect_size \
-output_column_map TEST test \
-output_column_map LOG10P pvalue \
-output_column_map CHROM chromosome \
--chromosome_format \
-output_column_map N sample_size \
-output_column_map SE standard_error \
-output_column_map INFO info \
-output_column_map CHISQ chisq \
-output_column_map EXTRA extra \
-output_column_map GENPOS position \
-output_column_map A1FREQ frequency \
-output_order variant_id panel_variant_id chromosome position effect_allele non_effect_allele frequency pvalue test effect_size chisq standard_error sample_size \
-output ./${PHENOTYPE}_ADDITIVE.txt.gz

sample_summary_stats.txt

I have also attached a sample of what my summary stats file looks like. These were generated using REGENIE V3.

Thank you! Harry

hsmith9002 commented 1 year ago

UPDATE: I ran the harmonization script without the liftover flag, and am now retaining ~50% of the original variants after restricting to reference step. Does this seem like expected behavior, or is there something I am missing before moving on to imputation.

Harry

Fnyasimi commented 1 year ago

Hi Harry,

It looks like your summary stats in build b38 hence you don't need to do the liftover. Always confirm the genome build before using the liftover.