rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
189 stars 55 forks source link

Region-based association tests error #472

Open sbguarch opened 12 months ago

sbguarch commented 12 months ago

Hi,

This error has been reported in previous posts but I encountered it again running region-based association tests. I got the message error pasted below for some genes, but if I exclude them, then the execution runs smoothly until another gene shows the same exact error. Somehow, the software is not able to just skip them by default. I'm running the conda regenie 3.3 release.

_ERROR: Throw location unknown (consider using BOOST_THROW_EXCEPTION) Dynamic exception type: boost::wrapexcept std::exception::what: Error in function boost::math::cdf(const chi_squareddistribution&, double): Chi Square parameter was -nan, but must be > 0 !

joellembatchou commented 11 months ago

Hi,

Can you include the full REGENIE log when running with options --debug and --extract-setlist with the name of a gene set where you get the error?

Cheers, Joelle

sbguarch commented 11 months ago

Hi Joelle, Thanks for your answer. This is the message that I get with the options that you suggested and one of the genes that gives me an error:

set [1/1] : COL8A1 - 2 variants...1 chunks -reading in genotypes, computing gene-based tests and building masks...(1)memory usage=97MB...(2)memory usage=97MB...WARNING: 2/2 masks fail MAC filter and will be skipped...(3)memory usage=97MB...set #2; rare_mask [mu,nZ,w,w_a] = [1.434596734857831e-05,1,24.99569656479345,0.004481539359507942] Q_SKAT for all masks: 0 290.8514109017937 Q_BURDEN for all masks: 0 290.8514109017937 Q_SKAT for all masks: 0 290.8514109017937 Q_BURDEN for all masks: 0 290.8514109017937 ERROR: Throw location unknown (consider using BOOST_THROW_EXCEPTION) Dynamic exception type: boost::wrapexcept std::exception::what: Error in function boost::math::cdf(const chi_squared_distribution&, double): Chi Square parameter was -nan, but must be > 0 !

joellembatchou commented 11 months ago

Hi,

Can you pull from the branch "debug_issue_472" (i.e. 'git checkout debug_issue_472; git pull`) and re-compile the REGENIE binary then re-run the same command?

sbguarch commented 9 months ago

Hi Joelle,

Sorry for not coming back to you before. By using your branch I still get the same error. I really dont know how to solve this.

ERROR: Throw location unknown (consider using BOOST_THROW_EXCEPTION) Dynamic exception type: boost::wrapexcept std::exception::what: Error in function boost::math::cdf(const chi_squared_distribution&, double): Chi Square parameter was -nan, but must be > 0 !

Silvia

joellembatchou commented 8 months ago

Hi Silvia,

I've pushed additional updates to the same branch. Can you git pull and rerun? Please include the output from stdout & stderr.

Thank you, Joelle

jerome-f commented 2 months ago

Hi @joellembatchou,

I am facing this issue with v3.6. I patched the code with the print statements in debug_472 and then recompiled. Here are the details from logand std:

Log

Chromosome 1 [1 sets in total]
   -fitting null logistic regression on binary phenotypes...done (6921ms) 
   -fitting null Firth logistic regression on binary phenotypes...done (1548904ms) 
 set [1/1] : SERBP1-ENSG00000142864 - 266 variants...1 chunks
     -reading in genotypes, computing gene-based tests and building masks...(1)memory usage=2834MB...(2)memory usage=2865MB...WARNING: 1/12 masks fail MAC filter and will be skipped...(3)memory usage=2897MB...ERROR: Throw location unknown (consider using BOOST_THROW_EXCEPTION)
Dynamic exception type: boost::wrapexcept<std::domain_error>
std::exception::what: Error in function boost::math::cdf(const chi_squared_distribution<double>&, double): Chi Square parameter was -nan, but must be > 0 !

stdout/err

Mask : p_mod
Mask : p_mod
Mask : p_mod
Mask : p_mod
#sites in mask=17
L:
532.4157026 610.0549629  700.188641 767.3792286 931.2162203 1259.923528  1709.24162  2060.89591 2556.774849 3178.087653 3861.931991 4316.716594 4703.956017 5307.314526 8951.367773 30035.06258
[muQ, scFac, sd, df, v0, vq]= [71482.52779 0.637185777 73412.77449 1.455765031 2188141593 5389435458 ]
tau=37596.48052 38306.84125 40437.92347 43989.72716 48962.25232 55355.49896  73114.5174 108561.5182
Q:
111221.5106 110272.3853 107425.0095 102679.3832 96035.50639 87493.37904 63765.24751 16403.89697
skato-acat logp= 0.482625874 0.4773786187 0.4614427109 0.4346865125 0.3982766255 0.3554112144 0.2628874663 0.1597682085
Mask : p_ben
Mask : p_ben
Mask : p_ben
#sites in mask=2

Pseudo-firth (fast) starting beta = 0
[1] beta.head=(0...); score=71.47685522522099
[[1]] beta=(0.08240476930097676...); bdiff=0.08240476930097676; score=-72.29481505112716
[[2]] beta=(0.05376452793166106...); bdiff=0.0286402413693157; score=-5.727409299584643
[[3]] beta=(0.05097978914608285...); bdiff=0.002784738785578209; score=-0.0872100431644629
[2] beta.head=(0.05093604827307301...); score=-4.296058279168049
[[1]] beta=(0.04878024377259624...); bdiff=0.002155804500476774; score=-0.05354707675582304
[3] beta.head=(0.04875268437042914...); score=0.2507644592421183
[4] beta.head=(0.04888178951727878...); score=-0.01486663581638403
[5] beta.head=(0.0488741473161599...); score=0.0008683894297423222
stopping criterion met (|-5.076910144019564e-05| < 0.00025)
Ni=6; beta=(0.04887459375298366); score=-5.076910144019564e-05
T_burden=1.97439038408723;R_factor_burden:1
L:
50.77772396109665
[muQ, scFac, sd, df, v0, vq]= [50.77772396109665 0.2352676153517196 305.2291994580939 1 5156.754501338659 93164.86420182891 ]
tau=917.3761456236113 917.8838914695458 919.4071290073492 921.9458582370216  925.500079158563 930.0697917719735 942.7634379203357 968.0999556324666
Q:
3910.545783 3909.180945 3905.086432 3898.262243 3888.708379  3876.42484 3842.303897 3774.198495
skato-acat logp=1.373582886 1.373208761 1.372059268 1.370053446 1.367057922 1.362890028 1.349514883 1.315791902

Pseudo-firth (fast) starting beta = 2.334519463
[1] beta.head=(2.334519463244378...); score=0.4149927849979309
[[1]] beta=(2.632933831233284...); bdiff=0.2984143679889057; score=-0.03526119489597412
[2] beta.head=(2.611281042288837...); score=-0.02956198299636603
[3] beta.head=(2.592930584231554...); score=0.001848870554702287
stopping criterion met (|-0.0001256591549118058| < 0.00025)
Ni=4; beta=(2.59408892046889); score=-0.0001256591549118058
uncorrected: 13.22176979708586 [=(43.87173887737698)^2/145.5727562696614] -> 7.360774339213094
Rsqrt:1.340241445461406                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1                 1
ERROR: Throw location unknown (consider using BOOST_THROW_EXCEPTION)
Dynamic exception type: boost::wrapexcept<std::domain_error>
std::exception::what: Error in function boost::math::cdf(const chi_squared_distribution<double>&, double): Chi Square parameter was -nan, but must be > 0 !

The print statements did not make it through to the stdout/stderr, so not sure where exactly the -nan is getting populated. I have one recommendation to add code to catch this exception and report it to the log but continue with the next test rather than exiting out.

jerome-f commented 2 months ago

@joellembatchou weirdly this happens only when running across multiple phenotype. I ran the debugging runs again for each individual phenotype instead running them across and they all ran without encountering this error. I looked the the mask file and they are different (differnet set of residues) between each phenotype for the same mask grouping:

debug_bt_X172.11.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton   17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778618_G_A,17_75778717_T_C,17_75778824_T_G,17_75778860_C_G,17_75778902_G_A,17_75778916_G_A,17_75779077_G_A,17_75779087_C_T,17_75779107_G_A
debug_bt_X244.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton  17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778618_G_A,17_75778642_G_A,17_75778717_T_C,17_75778824_T_G,17_75778860_C_G,17_75778902_G_A,17_75778916_G_A,17_75779077_G_A,17_75779087_C_T,17_75779107_G_A
debug_bt_X274.1.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton    17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778618_G_A,17_75778642_G_A,17_75778717_T_C,17_75778824_T_G,17_75778860_C_G,17_75778902_G_A,17_75778916_G_A,17_75779077_G_A,17_75779087_C_T,17_75779107_G_A
debug_bt_X275.1.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton    17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778618_G_A,17_75778642_G_A,17_75778717_T_C,17_75778824_T_G,17_75778860_C_G,17_75778902_G_A,17_75778916_G_A,17_75779077_G_A,17_75779087_C_T,17_75779107_G_A
debug_bt_X280.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton  17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778618_G_A,17_75778642_G_A,17_75778717_T_C,17_75778824_T_G,17_75778860_C_G,17_75778902_G_A,17_75778916_G_A,17_75779077_G_A,17_75779087_C_T,17_75779107_G_A
debug_bt_X365.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton  17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778618_G_A,17_75778642_G_A,17_75778717_T_C,17_75778824_T_G,17_75778860_C_G,17_75778902_G_A,17_75778916_G_A,17_75779077_G_A,17_75779087_C_T,17_75779107_G_A
debug_bt_X495.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton  17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778618_G_A,17_75778642_G_A,17_75778717_T_C,17_75778860_C_G,17_75778902_G_A,17_75778916_G_A,17_75779077_G_A,17_75779087_C_T,17_75779107_G_A
debug_bt_X555.1.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton    17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778615_T_G,17_75778618_G_A,17_75778860_C_G,17_75778902_G_A,17_75779077_G_A,17_75779087_C_T
debug_bt_X555.2.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton    17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778615_T_G,17_75778618_G_A,17_75778860_C_G,17_75778902_G_A,17_75779077_G_A,17_75779087_C_T
debug_bt_X555.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton  17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778615_T_G,17_75778618_G_A,17_75778860_C_G,17_75778902_G_A,17_75779077_G_A,17_75779087_C_T
debug_bt_X585.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton  17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778618_G_A,17_75778642_G_A,17_75778717_T_C,17_75778824_T_G,17_75778860_C_G,17_75778902_G_A,17_75778916_G_A,17_75779077_G_A,17_75779087_C_T,17_75779107_G_A
debug_bt_X714.1.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton    17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778618_G_A,17_75778642_G_A,17_75778717_T_C,17_75778824_T_G,17_75778860_C_G,17_75778902_G_A,17_75778916_G_A,17_75779077_G_A,17_75779087_C_T,17_75779107_G_A
debug_bt_X721.chrom.17_masks.snplist:H3-3A-ENSG00000163041.p_del.singleton  17  75778599    17_75778611_C_T,17_75778614_A_T,17_75778618_G_A,17_75778642_G_A,17_75778717_T_C,17_75778824_T_G,17_75778860_C_G,17_75778902_G_A,17_75778916_G_A,17_75779087_C_T,17_75779107_G_A

This begs the question on if the phenotypes should be tested independently instead of listing multiple phenotypes in--phenoColList ? Any clarification would be helpful.