mitchelloharawild / distributional

Vectorised distributions for R
https://pkg.mitchelloharawild.com/distributional
GNU General Public License v3.0
94 stars 15 forks source link

Add basic distribution fitting functionality #10

Closed mitchelloharawild closed 2 years ago

mpadge commented 4 years ago

Just a ping here that this functionality would, in my opinion, be particularly likely to expand scope and usage of the package. It's an extremely non-trivial exercise, which is almost certainly why no other package offers this ability in any general way, but if you could crack it, the boost to analytical capacities (and package usage!) would be huge. I'm thinking in particular of data sufficiently large to prevent construction or inversion of covariance matrices, which happens all the time. The ability to fit distributions would enable data to be grouped, and the groups used to estimate (co-)variance distributions between groups. An impossibly large covariance matrix could then be effectively condensed down to a manageable size by populating it with parameters from empirically fitted distributions.

Some really good code for inspirational approaches to distribution fitting exists in @csgillespie's powerLaw package (via R6 in that case), and i've got an extension of that in a package of my own. The latter illustrates the general principle that most approaches to distribution fitting still rely on Kolmogorov-Smirnov-type stuff, which at least makes the task of fitting individual distributions relatively straightforward. Comparing statistics between candidate distributions is where the nightmare starts ... Happy to chat about approaches any time you like.

mitchelloharawild commented 4 years ago

Thanks for your comment - external opinions on this are very welcomed as I have been needing this package more for representing distributions and less for fitting distributions. I expect to be working on this goal once the distribution structures are refined (#16, #25, #34, #36). However if you feel strongly about having this functionality sooner and have the time to think about how this should work then I can reprioritise it.

mpadge commented 4 years ago

Oh no, please don't reprioritise, as I'll hardly have time myself. It'll suffice for me to know that i'll automatically be pinged here when development finally does get going. Looking forward to it ... whenever it may come. Thanks!

csgillespie commented 4 years ago

Just to add to this. In {poweRlaw} I used reference classes (this was pre-R6). If I ever find time, I would change to R6 as they are a bit faster.

However using reference/R6 classes does impact contributions from the community. Basically no-one understands reference classes, so the code is hard to change.

mitchelloharawild commented 2 years ago

Closing this issue as I believe it is out of scope. Instead of providing functions for fitting distributions, the package can have some vignettes describing basic optimisation methods using the package. Some possibly more complicated fitting methods can be described with reference to the packages above.

For example, MLE:

library(distributional)

sample <- rnorm(1000)

normal_log_likelihood <- function(par, y) {
  log_likelihood(dist_normal(par[1], par[2]), list(y))
}

optim(
  par = c(mean = mean(sample), sd = sd(sample)), 
  fn = normal_log_likelihood,
  y = sample,
  control = list(fnscale = -1)
)
#> $par
#>       mean         sd 
#> 0.03815706 0.98306439 
#> 
#> $value
#> [1] -1401.78
#> 
#> $counts
#> function gradient 
#>       41       NA 
#> 
#> $convergence
#> [1] 0
#> 
#> $message
#> NULL

Created on 2021-11-08 by the reprex package (v2.0.0)