jianyangqt / gcta

GCTA software
GNU General Public License v3.0
73 stars 23 forks source link

Error: there are too many SNPs that have large difference in allele frequency #60

Open shreya2031 opened 7 months ago

shreya2031 commented 7 months ago

Hi @anglixue,

I have been trying to run mtCOJO on a trait (GWAS summary available here: https://figshare.com/articles/dataset/scz2022/19426775?file=34517828) while adjusting for another trait (GWAS summary available here: https://conservancy.umn.edu/handle/11299/241912 filename: GSCAN_CigDay_2022_GWAS_SUMMARY_STATS_EUR.txt.gz) using GCTA v1.94.1 and this is the error message I received:

Error: there are too many SNPs that have large difference in allele frequency. Please check the GWAS summary data. An error occurs, please check the options or data

This is the command I used: ./gcta --bfile /home/1000G/ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes --mtcojo-file data_list_scz_cigday.txt --ref-ld-chr /home/gcta/eur_w_ld_chr/ --w-ld-chr /home/gcta/eur_w_ld_chr/ --out mtcojo_scz_cigday

This is the log file:


Accepted options: --bfile /home/1000G/ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes --mtcojo-file data_list_scz_cigday.txt --ref-ld-chr /home/gcta/eur_w_ld_chr/ --w-ld-chr /home/gcta/eur_w_ld_chr/ --out mtcojo_scz_cigday

Reading PLINK FAM file from [/home/1000G/ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.fam]. 2504 individuals to be included from [/home/1000G/ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.fam]. Reading PLINK BIM file from [/home/1000G/ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bim]. 80845844 SNPs to be included from [/home/1000G/ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bim].

Reading GWAS summary data from [data_list_scz_cigday.txt] ... 7341181 SNPs in common between the target trait and the covariate trait(s). Filtering out SNPs with multiple alleles or missing value ... 864 SNPs have missing value or mismatched alleles. These SNPs have been saved in [mtcojo_scz_cigday.badsnps]. 7340317 SNPs are retained after filtering. There are 3888 genome-wide significant SNPs with p < 5.0e-08.

Reading PLINK BED file from [/home/1000G/ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bed] in SNP-major format ... Genotype data for 2504 individuals and 3888 SNPs to be included from [/home/1000G/ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bed]. Calculating allele frequencies ... Checking the difference in allele frequency between the GWAS summary datasets and the LD reference sample... 5478219 SNP(s) have large difference of allele frequency between the GWAS summary data and the reference sample. These SNPs have been saved in [mtcojo_scz_cigday.freq.badsnps]. Error: there are too many SNPs that have large difference in allele frequency. Please check the GWAS summary data. An error occurs, please check the options or data

Could you please help me solve this issue?

Thanks! Shreya

ShouyeLiu commented 3 months ago

Have you checked the definition of effect allele in your summary data? The effect alleles may be mismatched between GWAS and LD reference. In ma format, A1 is the effect allele (see here https://yanglab.westlake.edu.cn/software/gcta/#COJO).