hakyimlab / summary-gwas-imputation

harmonization, liftover, and imputation of summary statistics from GWAS
MIT License
31 stars 20 forks source link

Output problem with gwas_parsing.py #3

Closed lauraand1705 closed 4 years ago

lauraand1705 commented 4 years ago

Hi. I am having trouble with a liftover with gwas_parsing.py. My input has the columns (example with first line): CHROM POS ID REF ALT A1 A1_FREQ MACH_R2 TEST OBS_CT BETA SE Z_STAT P A2 10 90127 10:90127:C:T;rs79817489 C T T 0.0741934 0.918957 ADD 3657 -0.040217 0.1068 -0.376564 0.706497 C

I use the command: python .../gwas_parsing.py \ -gwas_file ../test.tab.gz \ -output_column_map ID variant_id \ -output_column_map A2 non_effect_allele \ -output_column_map A1 effect_allele \ -output_column_map A1_FREQ freq \ -output_column_map BETA effect_size \ -output_column_map P pvalue \ -output_column_map SE standard_error \ -output_column_map CHROM chromosome \ -output_column_map POS position \ -output_column_map OBS_CT sample_size \ -output_order variant_id non_effect_allele effect_allele pvalue standard_error chromosome position freq sample_size effect_size \ -liftover hg19ToHg38.over.chain.gz \ -output test.hg38.tab

But get an output with NAs in "chromosome" and "position", e.g.: variant_id non_effect_allele effect_allele pvalue standard_error chromosome position freq sample_size effect_size 10:90127:C:T;rs79817489 C T 0.706497 0.1068 NA NA 0.07419339999999999 3657 -0.040217

I have tried to add "chr" before chromosome number in my inputfile, but get the same results. Do you have any suggestions? Thank you for your help.

Best regards, Laura.

gustavahlberg commented 4 years ago

It seems that gwas_parsing.py can't parse the 'X' chromosome since its not an int.

gustavahlberg commented 4 years ago

In the Genomics.sort function

Heroico commented 4 years ago

Hi Laura and Gustav,

Indeed, we don't support chromosome X. The easy workaround for now is to exclude these variants.

Regarding the chrstring prefix in the chromosome, the script indeed works with chromosome values like chr1. If you add the argument --chromosome_format, the parsing script will add the chr prefix on-the-fly when reading values.

Laura, do you get NA in all chromosome and position entries? What's your usage scenario? If you plan to integrate these gwas with PrediXcan models, a -snp_reference_metadata argument is required ($DATA/reference_panel_1000G/variant_metadata.txt.gz METADATA in the tutorial)

lauraand1705 commented 4 years ago

Hi Alvaro, Thanks for your reply. It helped to exclude chr X variants. Best regards, Laura.

Heroico commented 4 years ago

Hi Laura,

I'm happy to hear you solved your issue. Best,

Alvaro