wouterpeyrot / CCGWAS

18 stars 6 forks source link

Trouble with recognizing numeric data? #5

Closed ghost closed 2 years ago

ghost commented 3 years ago

I am trying to replicate the CCGWAS paper findings, but run into a problem with the SCZ and BIP dataset. The test set which is a portion of this runs without a problem, but the larger dataset gives: "Error in round(max(stats$OR), digits = 2) : non-numeric argument to mathematical function". Can you advise what I should do?

ghost commented 3 years ago

Just to update for anyone who may have tried to replicate the SCZ vs BIP analysis, this was due to the one of the summary statistics containing some ridiculously large ORs that R was treating as character (eg 1e48). In order to replicate the published paper we were able to comfortably disregard those values. However, we note that the method filters out odd ratios that are greater than 2. I wonder if this can be changed for when and where needed, as there are several instances where an OR of greater than 2 or less than 0.4 is absolutely genuine, and in fact a major contributor to disease. Eg some AD cohorts where APOE4 and 2 have highly risk and protective effects respectively.

wouterpeyrot commented 3 years ago

The software transforms effect sizes from one scale (OR) to another scale (linear regression with standardized genotypes and phenotypes with a 50/50 case-control ascertainment). This transformation can be off for OR>2 or OR<0.5, thereby risking type I error for a specific set of SNPs (stress test SNPs). Therefore, CC-GWAS filters SNPs with OR>2 or OR<0.5, and it is not desirable to change these cut-offs. Best wishes, Wouter

ghost commented 3 years ago

Many thanks