hakyimlab / summary-gwas-imputation

harmonization, liftover, and imputation of summary statistics from GWAS
MIT License
31 stars 20 forks source link

Error running run_coloc.py : dataset 1: missing required element(s) snp #23

Open JieWu2012 opened 9 months ago

JieWu2012 commented 9 months ago

Hi when I run run_coloc.py:

python summary-gwas-imputation/src/run_coloc.py -keep_intermediate_folder -gwas_mode bse -gwas test1.txt -eqtl_mode bse -eqtl test2.txt -gwas_sample_size 149461 -eqtl_sample_size 670 -p1 1e-05 -p2 1e-04 -p12 1e-06 -parsimony 1 -output test_output.txt

I got error:

INFO - Loading gwas Level 9 - sanitizing gwas INFO - Beggining process Level 9 - Processing gene ENSG00000227232.5 Level 9 - sanitizing eqtl WARNING - R[write to console]: Error in check_dataset(d = dataset1, 1) : dataset 1: missing required element(s) snp

INFO - Exception running coloc: Traceback (most recent call last): File "/oak/stanford/scg/lab_lilab/jwu/vitiligo/GWAS/summary-gwas-imputation/src/genomic_tools_lib/external_tools/coloc/Coloc.py", line 169, in _coloc c = coloc_r(dataset1=d1, dataset2=d2, p1=p1, p2=p2, p12=p12) File "/home/jiewu23/.conda/envs/Coloc/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 202, in call .call(*args, kwargs)) File "/home/jiewu23/.conda/envs/Coloc/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 124, in call res = super(Function, self).call(*new_args, *new_kwargs) File "/home/jiewu23/.conda/envs/Coloc/lib/python3.7/site-packages/rpy2/rinterfacelib/conversion.py", line 45, in cdata = function(args, kwargs) File "/home/jiewu23/.conda/envs/Coloc/lib/python3.7/site-packages/rpy2/rinterface.py", line 810, in call raise embedded.RRuntimeError(_rinterface._geterrmessage()) rpy2.rinterface_lib.embedded.RRuntimeError: Error in check_dataset(d = dataset1, 1) : dataset 1: missing required element(s) snp

(test data are based on https://github.com/hakyimlab/summary-gwas-imputation/wiki/Running-Coloc) test1.txt: panel_variant_id effect_size standard_error frequency sample_size chr1_731718_T_C_b38 0.039186 0.033355 0.1336 42921 chr1_734349_T_C_b38 0.041351 0.034082 0.128868 42921 chr1_752566_G_A_b38 -0.018285 0.021551 0.845492 149758

test2.txt gene_id variant_id tss_distance ma_samples ma_count maf pval_nominal slope slope_se ENSG00000227232.5 chr1_13550_G_A_b38 -16003 19 19 0.0141791 0.84 0.15 0.07 ENSG00000227232.5 chr1_14671_G_C_b38 -14882 17 17 0.0126866 0.17 -0.028 0.58 ENSG00000227232.5 chr1_14677_G_A_b38 -14876 69 69 0.0514925 0.99 -0.99 0.99

Can you please help me on this?

xinranxu0930 commented 1 month ago

Although I didn't use this script, I encountered the same issue when calling coloc.abf in Python. I modified my code, and it now runs successfully:

result <- coloc.abf(
    dataset1 = list(snp = df$rsID, pvalues = df$p_value_1, type = "quant", N = 104),
    dataset2 = list(snp = df$rsID, pvalues = df$p_value_2, type = "quant", N = 104),
    MAF = df$EAF
)

I hope this helps you!