saigegit / SAIGE

Development for SAIGE and SAIGE-GENE(+)
GNU General Public License v3.0
66 stars 29 forks source link

Inconsistent set-based p-values between runs #25

Closed pedrumstick closed 2 years ago

pedrumstick commented 2 years ago

I ran SAIGE-GENE+ v1.0.4 on the same data with different sets of functional annotations but all other parameters equal (run 1 log) (run 2 log). Annotations for run 1 were set to --annotation_in_groupTest=PTV and run 2 were set to --annotation_in_groupTest=PTV,missense_damaging,splice_damaging,missense_other (i.e., PTV annotation is shared across both runs and variants comprising PTV in both group files were confirmed to be the same).

As I understand each annotation is tested independently, so I expected set-based results for PTV annotation should be equal across both runs. This is the case for most genes but I'm observing different p-values for a subset of genes (example shown below) .

Run 1

Region Chromosome Group   max_MAF  Pvalue Pvalue_Burden Pvalue_SKAT BETA_Burden SE_Burden   MAC MAC_case MAC_control Number_rare Number_ultra_rare
<chr>  <chr>      <chr>     <dbl>   <dbl>         <dbl>       <dbl>       <dbl>     <dbl> <dbl>    <dbl>       <dbl>       <dbl>             <dbl>
ADGRG7 chr3       PTV_f01     0.5 0.00114       0.00127     0.00139       0.926     0.287   329       23         306           4                26

Run 2

Region Chromosome Group   max_MAF Pvalue Pvalue_Burden Pvalue_SKAT BETA_Burden SE_Burden   MAC MAC_case MAC_control Number_rare Number_ultra_rare
<chr>  <chr>      <chr>     <dbl>  <dbl>         <dbl>       <dbl>       <dbl>     <dbl> <dbl>    <dbl>       <dbl>       <dbl>             <dbl>
ADGRG7 chr3       PTV_f01     0.5      1         0.795       0.796      0.0992     0.382   207        8         199           3                26

Upon further investigation, I noticed some (but not all) of these genes appear to have an error message in the log Reading ith marker failed.

I'm having difficulty understanding 1) why there are different p-values for all of these genes, and 2) why this error message would appear in one run but not the other with the same variants and population. Can you clarify whether this behaviour is expected?

Thank you in advance, and please let me know if there is any additional information that I can provide!

weizhouUMICH commented 2 years ago

Hi @pedrumstick,

Thanks so much for reporting the issue and proving the detailed information for us to investigate the issue! You are right. The annotations should be tested independent. Could you please try using --is_output_markerList_in_groupTest=TRUE for those problematic genes to see which markers are differently included for PTV? We are trying to replicate this issue.

Thanks, Wei

saigegit commented 2 years ago

Hi @pedrumstick,

We have found the issue is very likely because that the markers are not ordered by chromosome positions in the group file and this causes issues for reading markers from VCF files. This has been fixed in the new version v1.0.6. Please feel free to give it a try and re-open the issue if it persists.

Thanks, Wei