vatlab / varianttools

software tool for the manipulation, annotation, selection, and analysis of variants in the context of next-gen sequencing analysis
https://vatlab.github.io/vat-docs/
GNU General Public License v3.0
31 stars 4 forks source link

Error in association analysis #19

Open NandiniBN opened 7 years ago

NandiniBN commented 7 years ago

Hello, I am trying to run Burden test and I'm running into the following error

vtools associate variant aff -m "LogitRegBurden --alternative 2" -j1 --to_db logit > all.asso.res INFO: 127 samples are found INFO: 215112 groups are found Loading genotypes: 100% [===================================================================================================================================================================================================================================================================================] 127 0.4/s in 00:05:25 GSL Error 11: errorGSL Error 11: gsl_sf_gamma_inc_Q_e(a, x, &result)GSL Error 11: errorGSL Error 11: gsl_sf_gamma_inc_Q_e(a, x, &result)GSL Error 11: errorGSL Error 11: gsl_sf_gamma_inc_Q_e(a, x, &result)GSL Error 11: errorGSL Error 11: gsl_sf_gamma_inc_Q_e(a, x, &result)GSL Error 11: errorGSL Error 11: gsl_sf_gamma_inc_Q_e(a, x, &result)

Also, when I load the phenotype info it says ERROR: Invalid or missing value detected for field sex. Allowed values are M/F, 1/2, Male/Female. But all the genders are coded M/F with no missing value.

Any advice on how to proceed ?

Thank you,

gaow commented 7 years ago

Our logistic regression routine is a textbook implementation of a 2nd order Newton-Raphson approximation method. It would run into problems with small sample size (matrix might not be invertable). I'd suggest you use linear regression routines with permutations for small sample sizes. There are advanced logistic regression techniques as many machine learning packages use nowadays that we can potentially adopt. We are yet to work on that.

It's very hard to say without looking at the phenotype file whether indeed all rows are properly coded. I'd suggest you load the data into R or Python and see the unique values in that gender field. Most likely there is a missing value or abnormal coding that skipped your eye. I hope this information is helpful.