Suite of Parametric Distributions

vincenzocoia commented 3 years ago

(Continuation of the issue in dplyr of the same name)

vincenzocoia commented 3 years ago

New things that came up from today:

Get document() working from the new dst_*() additions (roxygen2 tags need correcting).
Parameterization in .quantities needs to match that of the parameterization in R's r,p,d,q functions -- not the parameterization (and names) used in our dst_*() functions.
Possibly consider debugging why variance.dst(dst_gpd(0, 1, 1)) is trying to access a pgpd() function (which does not exist) instead of eval_cdf.gpd(), which does exist.
- Bigger question -- is it worthwhile making pgpd(), qgpd(), and dgpd(), and just have the .parametric methods of eval_cdf() etc dispatched? Maybe so, because sometimes it's just useful to call the p/d/q/r functions directly.

yelselmiao commented 2 years ago

Get document() working from the new dst_*() additions (roxygen2 tags need correcting).

RESOLVED

2. Parameterization in .quantities needs to match that of the parameterization in R's r,p,d,q functions -- not the parameterization (and names) used in our dst_*() functions.

RESOLVED

3. ossibly consider debugging why variance.dst(dst_gpd(0, 1, 1)) is trying to access a pgpd() function (which does not exist) instead of eval_cdf.gpd(), which does exist.

eval_survival.parametric cannot handle gpd. We added the eval_survival.gpd to avoid accessing pgpd.

zhuzp98 commented 2 years ago

eval_survival.gpd is missing

When we run variance.dst(dst_gpd(..., ..., ...)), it goes to the function eval_survival.parametric(). However, this eval_survival.parametric() can not handle gpd objects. In fact, we need eval_survival.dst() to handle gpd object. At this time, the parametric is the subclass under dst in gpd object, and hence, we may need to write a eval_survival.gpd particularly for gpd object. This actually solved the calculation of variance.dst(dst_gpd(..., ..., ...)).

We wrote a eval_survival.gpd().

eval_survival.gpd <- function(distribution, at) {
  1 - eval_cdf(distribution, at = at)
}

Here is the message from the test. Now it seems the error comes form the roundoff.

==> Testing R file using 'testthat'

ℹ Loading distionary

══ Testing test-quantities.R ═══════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]norm parametric dst

 name :
[1] "norm"
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ]norm parametric dst

 name :
[1] "norm"
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 2 ]norm parametric dst

 name :
[1] "norm"
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 3 ]norm parametric dst

 name :
[1] "norm"
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 4 ]norm parametric dst

 name :
[1] "norm"
[ FAIL 2 | WARN 0 | SKIP 0 | PASS 4 ]norm parametric dst

 name :
[1] "norm"
[ FAIL 2 | WARN 0 | SKIP 0 | PASS 5 ]gpd parametric dst

 name :
[1] "gpd"
[ FAIL 3 | WARN 0 | SKIP 0 | PASS 5 ]gpd parametric dst

 name :
[1] "gpd"
[ FAIL 3 | WARN 0 | SKIP 0 | PASS 6 ]gpd parametric dst

 name :
[1] "gpd"
[ FAIL 4 | WARN 0 | SKIP 0 | PASS 6 ]gpd parametric dst

 name :
[1] "gpd"
[ FAIL 4 | WARN 0 | SKIP 0 | PASS 7 ]gpd parametric dst

 name :
[1] "gpd"
[ FAIL 5 | WARN 0 | SKIP 0 | PASS 7 ]gpd parametric dst

 name :
[1] "gpd"
[ FAIL 5 | WARN 0 | SKIP 0 | PASS 8 ]gpd parametric dst

 name :
[1] "gpd"
[ FAIL 6 | WARN 0 | SKIP 0 | PASS 8 ]

── Failure (test-quantities.R:49:7): quantities align with numeric computations. ──
`var1` (`actual`) not equal to variance.dst(d) (`expected`).

  `actual`: 1.000000
`expected`: 0.999997

── Failure (test-quantities.R:49:7): quantities align with numeric computations. ──
`var1` (`actual`) not equal to variance.dst(d) (`expected`).

  `actual`: 2.00000000
`expected`: 1.99999996

── Failure (test-quantities.R:49:7): quantities align with numeric computations. ──
`var1` (`actual`) not equal to variance.dst(d) (`expected`).

  `actual`: 3.329427
`expected`: 3.329430

── Failure (test-quantities.R:49:7): quantities align with numeric computations. ──
`var1` (`actual`) not equal to variance.dst(d) (`expected`).

  `actual`: 6.18110
`expected`: 6.18108

── Failure (test-quantities.R:49:7): quantities align with numeric computations. ──
`var1` (`actual`) not equal to variance.dst(d) (`expected`).

  `actual`: 192.2338
`expected`: 192.2339

── Error (test-quantities.R:49:7): quantities align with numeric computations. ──
Error: roundoff error is detected in the extrapolation table
Backtrace:
 1. testthat::expect_equal(var1, variance.dst(d)) test-quantities.R:49:6
 4. distionary::variance.dst(d)
 5. stats::integrate(sf2, 0, Inf) /Users/zhipeng.zhu/Documents/Personal/distionary/R/variance.R:12:2

[ FAIL 6 | WARN 0 | SKIP 0 | PASS 8 ]

Test complete

vincenzocoia commented 2 years ago

Ah yes, that precision error is no big deal, that's to be expected when calculating an integral numerically. Let's allow for a more lenient tolerance in the expect_equal() function. I believe the argument is tol?

zhuzp98 commented 2 years ago

Ah yes, that precision error is no big deal, that's to be expected when calculating an integral numerically. Let's allow for a more lenient tolerance in the expect_equal() function. I believe the argument is tol?

Sounds good. I believe the issue of variance for GPD distribution should be solved once we add the tol argument.

yelselmiao commented 2 years ago

Let's allow for a more lenient tolerance in the expect_equal() function. I believe the argument is tol?

Okay! We can add tolerance to the incoming test cases

yelselmiao commented 2 years ago

Weekly update:

Now the tolerance of testthat is 1e-03
I increase the allowed subdivisions for integrate in median.R and mean.R to be 2000 because of the exceeded maximum number of subdivisions error
pass the tests/fix the bugs for pois, unif, beta, binom and nbinom and exp
Working on the tests of weibull, it passes mean and variance but fails median. The formula of the hard-coded median looks fine to me

vincenzocoia commented 2 years ago

Thanks for the update. To make this message more effective, it should accompany a pull request.

vincenzocoia commented 2 years ago

(the reason being that it's hard for me/others to find these changes in the code)

yelselmiao commented 2 years ago

(the reason being that it's hard for me/others to find these changes in the code)

Thanks for the reminder! Please refer to the opening PR :)

vincenzocoia commented 2 years ago

I forgot to comment on:

When we run variance.dst(dst_gpd(..., ..., ...)), it goes to the function eval_survival.parametric(). However, this eval_survival.parametric() can not handle gpd objects.

Nice detective work, you're indeed correct. I think your solution of writing eval_survival.gpd() is a good fix. There might be deeper implications when viewing various representations as being linked together by a dependency network. But for now, I think we can be happy with the new solution. Thank you!

vincenzocoia commented 2 years ago

Closing this Issue: its big-picture initial focus was useful to get us started, but now it's more useful to focus on specifics. Most of the topics in this Issue have been resolved, and I've moved the remaining two projects as their own Issues (#22 and #23).

probaverse / distionary

Suite of Parametric Distributions #9