privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

Problem with snp_match() #477

Closed girarn closed 4 months ago

girarn commented 4 months ago

Hi,

I try to match SNPs data between the GWAS summary statistics and the genotyping data using snp_match() with join_by_pos set to FALSE but get an error telling me that not enough variants have been matched.

My data looks like this:

str(sumstats) 'data.frame': 1429545 obs. of 10 variables: $ rsid : chr "rs7899632" "rs3750595" "rs10786405" "rs41307056" ... $ chr : int 10 10 10 10 10 10 10 10 10 10 ... $ pos : int 100000625 100004906 100005282 100007694 100008436 100008785 100008926 100011219 100012890 100013438 ... $ a0 : chr "g" "c" "c" "g" ... $ a1 : chr "a" "a" "t" "a" ... $ beta : num 0.028 -0.0289 -0.029 0.0339 0.0224 ... $ beta_se: num 0.00503 0.00491 0.00491 0.01389 0.00509 ... $ N : int 163528 181391 181337 145947 181311 90238 94784 155311 170685 180798 ... $ p : num 2.75e-08 3.83e-09 3.65e-09 1.47e-02 1.02e-05 ... $ n_eff : int 163528 181391 181337 145947 181311 90238 94784 155311 170685 180798 ...

str(map) 'data.frame': 600267 obs. of 5 variables: $ chr : int 1 1 1 1 1 1 1 1 1 1 ... $ rsid: chr "rs3131972" "rs11240777" "rs4970383" "rs4475691" ... $ pos : int 752721 798959 838555 846808 854250 861808 873558 888659 891945 894573 ... $ a1 : chr "A" "A" "A" "T" ... $ a0 : chr "G" "G" "C" "C" ...

And the output I get is:

df_beta <- snp_match(sumstats, map, join_by_pos=F) 1,429,545 variants to be matched. 0 ambiguous SNPs have been removed. 0 variants have been matched; 0 were flipped and 0 were reversed. Error: Not enough variants have been matched.

The function tries to match 1 429 545 SNPs and finds zero matches. However, a simple matching done by rsid tells me that 456 570 SNPs are in both my GWAS summary statistics and genotyping data and thus should be matched. Is there something I'm not doing correctly when it comes to formatting the sumstats or something I should change for the function to work?

Thanks in advance!

privefl commented 4 months ago

You probably need to toupper() the alleles in sumstats.

girarn commented 4 months ago

Thank you very much for the help, can't believe I made such a simple mistake.