Closed rho62 closed 1 year ago
This is the same as #72, isn't it? Also, is there any impediments for implementing this for continuous distributions as well?
Perhaps... not sure... Seems to me, that there is a coding issue (only calling rtrunc vs sampleFromUntruncated) and a content/solution issue: How do we actually do it?
/R
Fra: Waldir Leoncio @.> Svar til: ocbe-uio/TruncExpFam @.> Dato: mandag 21. februar 2022 kl. 11:11 Til: ocbe-uio/TruncExpFam @.> Kopi: Rene Holst @.>, Author @.***> Emne: Re: [ocbe-uio/TruncExpFam] Rewrite functions for sampling from discrete truncated distributions (Issue #77)
This is the same as #72https://github.com/ocbe-uio/TruncExpFam/issues/72, isn't it? Also, is there any impediments for implementing this for continuous distributions as well?
— Reply to this email directly, view it on GitHubhttps://github.com/ocbe-uio/TruncExpFam/issues/77#issuecomment-1046695513, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFPRUPWKSFLENKXSPDR3Z53U4IFTRANCNFSM5O6C6QLA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.***>
On second thought, I think you're right. Seems wise to separate things and leave #72 for the duplicated coding issue and #77 and #78 for the slow-sampling issue.
If I understood correctly, the Binomial implementation is here:
The f(x) / [F(b) - F(a)] part is clearly defined on L28. f(x) (i.e., dens
) is transformed on L25 using my.dbinom()
. If that is correct, then this idea could be replicated for rtrunc()
, but unless I'm missing something there's no sampling involved on the function above, only rescaling of the densities (as expected, since resampling is only part of the r*
fucntions).
So a DRY solution might involve the following steps:
f_T(x)
from the dtrunc
methods into its own function. Could be a generic, since the x
argument would have different rtrunc_
classesrtrunc()
, temporarily coexistent with the current implementationOne thing that worries me about this approach is that this will probably make the output of rtrunc()
not match their stats
counterparts anymore, since the untruncated distribution will no longer be the base for the sampling. Is this acceptable?
An alternative to phasing out the old algotirhm is to have rtrunc()
contain an argument (like a boolean legacy
) that will run the old stats-compatible algo. This gives the user control over comparability with stats results vs speed of the new implementation.
Approach pursued so far
Sample from original (not truncated) distribution, followed by a truncation. In-efficient approach: Samples a surplus of unnecessary elements and difficult to predict the sample size required to achieve the target sample size.
Solution
Sample directly from the truncated distribution:
$$ f_T(x; \theta) = \frac{f(x; \theta)}{[F(b) - F(a)]} $$
Use
sample()
to sample from $a, ..., b$ with weights $f_T(a, ..., b; \theta)$Implemented for binomial. See code there. Needs to be implemented for other discrete distributions: Poisson, Neg. bin. others?
OBS: weights $f_T(x; \theta)$ are already implemented as
dtrunc.XXXX()
functions