statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.2k stars 3k forks source link

SUMM/ENH: rates, poisson means - missing parts, one sample, confint, power #8138

Open josef-pkt opened 2 years ago

josef-pkt commented 2 years ago

statsmodels.stats.rates has tests and tost for 2 independent samples.

We should be able to get the collection of functions similar to the ones for proportion

related:

I don't see a related open PR or issue with code. Most likely I never worked on these.

josef-pkt commented 2 years ago

I don't find a lot for confidence intervals

two that look fine, mainly based on MOVER

both articles have confint for one poisson rate and 2indep for ccomparison in MC with formulas and references

Li, Hui-Qiong, Man-Lai Tang, Wai-Yin Poon, and Nian-Sheng Tang. "Confidence intervals for difference between two Poisson rates." Communications in Statistics-Simulation and Computation 40, no. 9 (2011): 1478-1493.

Li, Hui-Qiong, Man-Lai Tang, and Weng-Kee Wong. "Confidence intervals for ratio of two Poisson rates using the method of variance estimates recovery." Computational Statistics 29, no. 3 (2014): 869-889.

see #6498 for MOVER

josef-pkt commented 2 years ago

Gu K, Ng HKT, Tang ML, Schucany WR. Testing the ratio of two Poisson rates. Biom J. 2008;50(2):283-298. I used it for the 2indep tests

they have the one-sided sample size formulas for their different test methods. It looks like the analogue of statsmodels.stats.proportion.samplesize_proportions_2indep_onetail for poisson ratio instead of binomial diff

appendix B has power formulas for one-sided test ratio ratio: alternative-ratio R' > null-ratio R power for given c = R / R'

will require a bit of work to get it for two-sided tests But the case should be mainly gaussian with hypothesis specific variances.

related conditional test, Fay rateratio in R has binomial distribution.

josef-pkt commented 2 years ago

possible idea: add option dispersion /scale correction to standalone inference functions.

The idea is to get an equivalent to overdispersed Poisson or/and Poisson with HC standard errors.

related would be negative binomial tests for possibly different variance functions (NB1, NB2, NBP)

To estimate diff in rates, we will need linear/identity link which we don't have as option in discrete count models.

This article looks interesting for this (based on very brief skimming)

Scosyrev, Emil, and Abhijit Pethe. 2022. “Confidence Intervals for Exposure-Adjusted Rate Differences in Randomized Trials.” Pharmaceutical Statistics 21 (1): 103–21. https://doi.org/10.1002/pst.2155.

I find now many more articles, in this area (It will take some time to read or skim those and get an overview of what we want) e.g. for standard Poisson with HC or excess dispersion we can just use GLM, GEE or discrete

However for sample size and power computation, it is still better or necessary to have an explicit function.

josef-pkt commented 2 years ago

ratesci has confidence intervals for Poisson and Binomial in the same functions https://cran.r-project.org/web/packages/ratesci/index.html

references are mainly for Binomial (e.g. Gart/Nam, ... ) which I had used for Binomial proportions. docstrings don't have detailed information

Laud, Peter J. 2017. “Equal-Tailed Confidence Intervals for Comparison of Rates.” Pharmaceutical Statistics 16 (5): 334–48. https://doi.org/10.1002/pst.1813. details in appendix and follow up articles, corrigendum, comment, reply

josef-pkt commented 2 years ago

a quick try: power forequivalence test for 2indep poisson ratio

based on Zhu and example in PASS docs

needs many arguments (currently only 1 of 2 variance methods)

def power_equivalence_poisson_2indep(rate1, nobs1, rate2, nobs2, exposure, low, upp, alpha=0.025, dispersion=1):
    v1 = dispersion / exposure * (1 / rate2 + 1 / (nobs_ratio * rate1))
    v0_low = v0_upp = v1
​
    crit = norm.isf(alpha)
    pow = (
        norm.cdf((np.sqrt(nobs2) * (np.log(rate1 / rate2) - np.log(low)) - crit * np.sqrt(v0_low)) / np.sqrt(v1)) +
        norm.cdf((np.sqrt(nobs2) * (np.log(upp) - np.log(rate1 / rate2)) - crit * np.sqrt(v0_upp)) / np.sqrt(v1))  - 1
        )
    return pow

cases = [
    (1.9, 704, 704, 0.90012),
    (2.0, 246, 246, 0.90057),
    (2.2, 95, 95, 0.90039),
    (2.5, 396, 396, 0.90045),
]

for case in cases:
    rate1, nobs1, nobs2, p = case
    pow = power_equivalence_poisson_2indep(rate1, nobs1, rate2, nobs2, exposure, low, upp, alpha=alpha, dispersion=dispersion)
    print(pow, case, pow - p)

0.9001222145464203 (1.9, 704, 704, 0.90012) 2.214546420242769e-06
0.9005694062765952 (2.0, 246, 246, 0.90057) -5.93723404773705e-07
0.9003919004778025 (2.2, 95, 95, 0.90039) 1.900477802463385e-06
0.9004463728439553 (2.5, 396, 396, 0.90045) -3.6271560446277107e-06
josef-pkt commented 2 years ago

similar method can also be used for comparing proportions, but I didn't see it in PASS.

e.g. e.g. power for non-zero null

Farrington, Conor P., and Godfrey Manning. 1990a. “Test Statistics and Sample Size Formulae for Comparative Binomial Trials with Null Hypothesis of Non-Zero Risk Difference or Non-Unity Relative Risk.” Statistics in Medicine 9 (12): 1447–54. https://doi.org/10.1002/sim.4780091208.

correction to simulations: Schoder, Volker. 2002. “Test Statistics and Sample Size Formulae for Comparative Binomial Trials with Null Hypothesis of Non-Zero Risk Difference or Non-Unity Relative Risk by C. P. Farrington and G. Manning, Statistics in Medicine 1990; 9:1447–1454.” Statistics in Medicine 21 (13): 1958–60. https://doi.org/10.1002/sim.1242.

josef-pkt commented 2 years ago

more on comparing poisson 2indep PASS has two approaches when testing with a nonzero null margin following Gu which I used for the test function (only one-sided tests) https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Tests_for_the_Ratio_of_Two_Poisson_Rates-Gu.pdf following Zhu which is the inferiority, equivalence article https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Tests_for_the_Ratio_of_Two_Poisson_Rates-Zhu.pdf doc of Zhu based test mentions 2-sided alternative but does not have example for it.

and another one for equality testing using diff = 0 or diff of sqrt transformed = 0 https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Tests_for_the_Difference_Between_Two_Poisson_Rates.pdf

aside for test of ratio of two negative Binomial, PASS uses Zhu and Lakkis however, for 2-sided they use the one tail power formula using abs value of ratiodiff) referencing "From Zhu and Lakkis (2014), page 379," https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Tests_for_the_Ratio_of_Two_Negative_Binomial_Rates.pdf

I have seen several articles that use one tail power formula for two-sided test without making comments that this is one tail rejection. (I think using one tail rejection is an upper bound on sample size, but often they write nobs > one_tail_samplesize. ??? as long as we are in the correct tail, or use abs for two sided)

josef-pkt commented 2 years ago

missing methods in test_poisson_2indep

Those are in notes section of docstring but not implemented PASS has them in the GU method for ratio of Poisson

References for the following are missing, etest has Gu et al as reference, but AFAIR that's incorrect

- 'exact-cond': exact conditional test based on binomial distribution
- 'cond-midp': midpoint-pvalue of exact conditional test
- 'etest': etest with score test statistic
- 'etest-wald': etest with wald test statistic

The 5 Gu et all methods are based on normal distribution, so we could add a dispersion factor to adjust scale

Zhu 2016 for equivalence in poisson and NB2 corresponds (basically) to wald-log and score-log, however Gu test statistics are defined in terms of counts, while Zhu statistics are directly in rates.

I had implemented our tost_poisson_2indep to directly delegate to the Gu test function. (one-sided with a margin)

I need to compare power computation between Gu and Zhu (and merge or adjust the functions) My new functions for power in one-sided and equivalence tests match the Zhu functions in PASS.

extra options that differ across articles, e.g. common exposure, same across samples some articles allow varying exposure (differences in follow-up times) Pure poisson only needs total exposure time n * t, but it can vary with excess dispersion and negbin

update reference for etest seems to be https://github.com/statsmodels/statsmodels/issues/2607#issuecomment-160144908 K. Krishnamoorthy, Jessica Thomson A more powerful test for comparing two Poisson means update2: Krishnamoorthy and Thomson is for diff. The current implementation uses ratio (score version uses constraint MLE and not the KT moment estimate). (end update2)

cond tests based on binom test has references to R functions and Fay not sure what's the original article

josef-pkt commented 2 years ago

aside: def proportion_confint(count, nobs, alpha=0.05, method='normal') has the "wrong" default method

I'm looking at one sample poisson rate confidence intervals, and "wald" is also bad there, if expected counts are small.

aside 2: confint_proportions_2indep newcomb is a MOVER confint, but we don't have an option to choose which one-sample confint method is used

elif method.startswith('newcomb'):
            low1, upp1 = proportion_confint(count1, nobs1,
                                            method='wilson', alpha=alpha)
            low2, upp2 = proportion_confint(count2, nobs2,
                                            method='wilson', alpha=alpha)

for poisson comparisons, I find articles that have several options for 2indep MOVER confint

josef-pkt commented 2 years ago

a reference from a different field with a good overview and reference list for binomial and poisson (not a lot of formulas included)

Cousins, Robert D., Kathryn E. Hymes, and Jordan Tucker. 2010. “Frequentist Evaluation of Intervals Estimated for a Binomial Parameter and for the Ratio of Poisson Means.” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 612 (2): 388–98. https://doi.org/10.1016/j.nima.2009.10.156.

josef-pkt commented 2 years ago

a good set for confint for one sample poisson rate

monte carlo comparisons and recommendations: looks good as basis for rates_poisson_confint in analogy to proportion_confint

Barker, Lawrence. 2002. “A Comparison of Nine Confidence Intervals for a Poisson Parameter When the Expected Number of Events Is ≤ 5.” The American Statistician 56 (2): 85–89. https://doi.org/10.1198/000313002317572736.

Patil, VV, and HV Kulkarni. 2012. “Comparison of Confidence Intervals for the Poisson Mean: Some New Aspects.” REVSTAT–Statistical Journal 10 (2): 211–27.

Swift, Michael Bruce. 2009. “Comparison of Confidence Intervals for a Poisson Mean – Further Considerations.” Communications in Statistics - Theory and Methods 38 (5): 748–59. https://doi.org/10.1080/03610920802255856.

still need to find and decide on references for poisson 2indep confint

the following compares MOVER confint for rates diff based on 4 one sample confint methods, and 3 other confint

Li, Hui-Qiong, Man-Lai Tang, Wai-Yin Poon, and Nian-Sheng Tang. 2011. “Confidence Intervals for Difference Between Two Poisson Rates.” Communications in Statistics - Simulation and Computation 40 (9): 1478–93. https://doi.org/10.1080/03610918.2011.575509.

other articles are for ratio of poisson rates

josef-pkt commented 2 years ago

and analogy in model Poisson, GLM

in GLM we can use poisson with sqrt link, which should be similar to the one and two sample tests.

We still don't have Firth penalization, which is just Jeffreys prior. In the one and two sample test, that mainly adds a fractional (0.5) count to the observed counts (?). It uses beta posterior in binomial and gamma posterior in poisson for one sample case. (I still need to check the details for those methods, and what applies to tests and what to confint.)

josef-pkt commented 2 years ago

confint for one poisson mean in R

https://stat.ethz.ch/R-manual/R-devel/library/survival/html/cipoisson.html has exact and continuity corrected anscomb (anscomb is sqrt(x + 3/ 8) transform, but they add cc )

Patil and Kulkarni have anscomb without continuity correction, similar to Barker 2002 RVS method, but Barker has strange handling of nobs n (which doesn't match up with exposure in corrections with constants like 3/8)
It looks like 3/8 in anscombe should be for counts and not for rate x-bar = count / n

functions in other R packages that I haven't tried out yet

https://rdrr.io/cran/epitools/man/pois.conf.int.html has 4 methods exact method (pois.exact), gamma distribution (pois.daly), Byar's formula (pois.byar), or normal approximation to the Poisson distribution (pois.approx). I don't know how "daly" gamma and "exact" differ update daly is just exact, Daly is the author of an implementation for SAS (not a relevant reference) in the example count, exposure = 3, 400, pois.exact and pois.daly give the same answer, and the same as gamma exact pois.approx is just plain wald confint.

https://search.r-project.org/CRAN/refmans/DescTools/html/PoissonCI.html has method = c("exact", "score", "wald", "byar") I have no idea what byar is, it's not in any of the 3 articles that I use they only say: "Byar's method is quite a good approximation. Rothman and Boice (1979) mention that these limits were first proposed by Byar (unpublished)."

Their score confint matches mine when I use ci as defined in swift for count and then divide by exposure and match score confint coded using barker 2002.

related aside http://onbiostatistics.blogspot.com/2014/03/computing-confidence-interval-for.html uses chisquare probabilities for confint for count in references and my code, we use Gamma distribution for confint for rate. I haven't looked into the relationship between these two. Patil and Kulkarni also use chi2 for count confint

josef-pkt commented 2 years ago

aside: confusing referencing

confint for one sample poisson Swift 2009 cites Vandenbrouke 1982 for a modified sqrt transformed confint. Vandenbrouke's article doesn't look like that to me, it's a verbal description of essentially just sqrt transformation. I will drop it again, I don't see an easy way to figure out what hypothesis tests corresponds to the confint formula (without fixing alpha). sqrt - anscombe version with sqrt(x + 3/ 8) is similar but with fixed added c=3/8 and has both test and confint easily available

WaldCC: There are two versions of continuity correction for wald test of a poisson mean. Barker 2002 adds 0.5 in the computation of variance, but does not shift the mean Patil and Kulkarni 2012 have waldcc shifted -0.5 for lower ci, and +0.5 for upper ci, with same shift used in variance.

for now I changed name for Barker version to "waldccv" but continuity correction is pretty useless, either drop it, or keep it just for comparison and recommending against it.

also: "exact-c": for central exact methods, for poisson I only implement the central version, no adjusted version that adjusts length like minlike, baker, ... (aside: I will add exact-c to binomial)

not clear modified wald and other methods Barker adds modified versions that avoid degenerate confidence intervals for zero counts, e.g. in wald, and sqrt In those cases: if count = 0, then exact-c confidence interval is used

either separate method like waldm (mwald) or separate keyword for zeros handling, with default to "modified=True"

Not sure yet: hypothesis test for poisson H0: rate = 0 when observed count = 0 produces nan in some methods, I'm not sure there is a "proper" answer. (one case in R had pvalue "< 1e-16" reported) (should be pvalue=1 ?)

josef-pkt commented 2 years ago

8166 has been merged into main

what's open from this issue?

still several things, like (generic) "cond" methods for 2indep that need proportion refactor. but I don't have an overview what else

josef-pkt commented 1 month ago

The example notebook needs some improvements for example missing subheaders and not very informative toc in 2-sample, we have

also more precise phrasing "Note: The nonequivalence test is in general conservative, its size is bounded by alpha, but in the large sample limit with fixed nonequivalence margins the size approaches alpha / 2"

"the size approaches alpha / 2" would be better "the size bound approaches alpha / 2" or "the bound on the size approaches alpha / 2"

i.e. it's only alpha / 2 at the indifference, non-equivalence bounds. For points in the interior of the null interval the size goes to zero, i.e. we "essentially" never reject in large samples if null is true with true parameter away from the boundary points.