jianyangqt / gcta

GCTA software
GNU General Public License v3.0
84 stars 26 forks source link

too many SNPs that have large difference in allele frequency #95

Open aydanasg opened 2 months ago

aydanasg commented 2 months ago

Hi there!

I am trying to mtCojo analysis on Alzheimer's Diseases by conditioning on Small vessel disease by running this:

gcta64 \ --mbfile 1000G_EUR_Phase3.mtcojo_ref_data.txt \ --mtcojo-file summary_list/AD_Jansen2019_SVD_Sargurupremraj2022.mtCOJO_summary_data.list \ --ref-ld-chr LDscore/ \ --w-ld-chr 1000G_Phase3_weights_hm3_no_MHC/ \ --out results/AD_Jansen2019_SVD_Sargurupremraj2022_mtcojo_result

This is the error message i get:

Reading the PLINK FAM files .... 489 individuals have been included from the PLINK FAM files. Reading the PLINK BIM files ... 9997231 SNPs to be included from PLINK BIM files.

Reading GWAS summary data from [summary_list/AD_Jansen2019_SVD_Sargurupremraj2022.mtCOJO_summary_data.list] ... 6327523 SNPs in common between the target trait and the covariate trait(s). Filtering out SNPs with multiple alleles or missing value ... 1144 SNPs have missing value or mismatched alleles. These SNPs have been saved in [results/AD_Jansen2019_SVD_Sargurupremraj2022_mtcojo_result.badsnps]. 6326379 SNPs are retained after filtering. There are 1328 genome-wide significant SNPs with p < 5.0e-08.

Reading PLINK BED files ... Skip reading /rds/general/user/aa19618/projects/epinott/live/user_analysed_data/Aydan/vasculature_disease_epi/mtCOJO/1000G_EUR_Phase3_plink/1000G.EUR.QC.1.bed, no SNPs retained on this chromosome. Skip reading /rds/general/user/aa19618/projects/epinott/live/user_analysed_data/Aydan/vasculature_disease_epi/mtCOJO/1000G_EUR_Phase3_plink/1000G.EUR.QC.4.bed, no SNPs retained on this chromosome. Skip reading /rds/general/user/aa19618/projects/epinott/live/user_analysed_data/Aydan/vasculature_disease_epi/mtCOJO/1000G_EUR_Phase3_plink/1000G.EUR.QC.7.bed, no SNPs retained on this chromosome. Skip reading /rds/general/user/aa19618/projects/epinott/live/user_analysed_data/Aydan/vasculature_disease_epi/mtCOJO/1000G_EUR_Phase3_plink/1000G.EUR.QC.9.bed, no SNPs retained on this chromosome. Skip reading /rds/general/user/aa19618/projects/epinott/live/user_analysed_data/Aydan/vasculature_disease_epi/mtCOJO/1000G_EUR_Phase3_plink/1000G.EUR.QC.11.bed, no SNPs retained on this chromosome. Skip reading /rds/general/user/aa19618/projects/epinott/live/user_analysed_data/Aydan/vasculature_disease_epi/mtCOJO/1000G_EUR_Phase3_plink/1000G.EUR.QC.12.bed, no SNPs retained on this chromosome. Skip reading /rds/general/user/aa19618/projects/epinott/live/user_analysed_data/Aydan/vasculature_disease_epi/mtCOJO/1000G_EUR_Phase3_plink/1000G.EUR.QC.18.bed, no SNPs retained on this chromosome. Skip reading /rds/general/user/aa19618/projects/epinott/live/user_analysed_data/Aydan/vasculature_disease_epi/mtCOJO/1000G_EUR_Phase3_plink/1000G.EUR.QC.19.bed, no SNPs retained on this chromosome. Skip reading /rds/general/user/aa19618/projects/epinott/live/user_analysed_data/Aydan/vasculature_disease_epi/mtCOJO/1000G_EUR_Phase3_plink/1000G.EUR.QC.20.bed, no SNPs retained on this chromosome. Skip reading /rds/general/user/aa19618/projects/epinott/live/user_analysed_data/Aydan/vasculature_disease_epi/mtCOJO/1000G_EUR_Phase3_plink/1000G.EUR.QC.21.bed, no SNPs retained on this chromosome. Genotype data for 489 individuals and 1328 SNPs have been included. Calculating allele frequencies ... Checking the difference in allele frequency between the GWAS summary datasets and the LD reference sample... 2469875 SNP(s) have large difference of allele frequency between the GWAS summary data and the reference sample. These SNPs have been saved in [results/AD_Jansen2019_SVD_Sargurupremraj2022_mtcojo_result.freq.badsnps]. Error: there are too many SNPs that have large difference in allele frequency. Please check the GWAS summary data. An error occurs, please check the options or data

I know that the allele frequency column is definitely correct, so I was wondering if you have any advice on what the issue could be. Also i am wondering about the many chromosomes that are not retaining any SNPs.

Thanks in advance! Aydan

longmanz commented 2 months ago

Hi, It seems that all your 1000G chromosomes return an error message of "no SNPs retained on this chromosome". You might need to check if the rsID of your 1000G data is really matching the rsID in your GWAS file. Sometimes the rsID of the GWAS file is named by "chr:pos" instead of real "rsID", which will cause this issue.

aydanasg commented 2 months ago

I checked and both my gwas and the 1000G data have rsID. Could it be that i am using 1000G_EUR_Phase3_plink as reference induvidual levele genotypes for LD estimation instead of the genotye data for the relevant GWAS studies (these are not publicly available)?