oobianom / quickcode

An R package made out of mine and Brice's scrapbook of much needed functions.
https://quickcode.obi.obianom.com
Other
5 stars 0 forks source link

Version 0.8: Check if data fits a distribution #19

Closed oobianom closed 3 weeks ago

oobianom commented 5 months ago

Some other new functions that I am working on for version 0.8 include a set of functions to check if a data fits a particular distribution. I find that this may be useful in several fields including pharmacometrics.

Below are a few thought, also in the current github update, you can install and check the current documentation with ??quickcode::is.log

is.lognormal(data, sig = 0.5)

is.normal(data, sig = 0.5)

is.uniform(data, sig = 0.5)

is.poisson(data, sig = 0.5)

is.gamma(data, sig = 0.5)

What do you think Brice? Does this is already exist somewhere?

brichard1638 commented 5 months ago

As far as I can tell, there is nothing in R that replicates your idea on distributions. However, take that affirmation as a relative truth. My tech tool does not capture or focus on R packages that relate directly to distribution-based functions. While I do possess some of that content, distributions play a secondary role in my CRAN-managed tool. The closest packages I have that are at least similar to what you are trying to do are the sn and visualize packages.

I would say that on balance, your idea to create a function designed to identify a particular distribution is worthy of pursuing it further.

If I were going to build a function like that, instead of testing in an itemized fashion every distribution in the universe, I would instead, focus on a function where the user could just pass a single dataset and the function would internally determine the distribution that most likely fits the dataset passed.

The logic of this approach is that if the user does not know what distribution is most likely represented by the data, he/she will not know which function against which to test it. I don't think it would make the user very happy to have to test against every possible distribution function found in your quickcode package.

A better function would be: getDistribution(data)

Perhaps a small dataframe could be returned indicating the top 3-5 distributions most likely represented along with a percentage to show the user how likely the match is.

If you should decide on this approach, I can provide a listing of really interesting distributions to consider. You've got some classic distributions listed but R offers some really interesting distributions about which you may not be aware or familiar.

oobianom commented 5 months ago

thanks Brice, i agree with your assessment. i agree with the 'getDistribution(data)' but in addition to it, i'd like to have the individual functions too because sometimes (eg. in my field) users are asking specific questions such as 'is this data lognormally distributed'. So on that note, i will be continuing to work on the function, and i am looking forward to your 'listing of really interesting distributions to consider'

oobianom commented 3 months ago

Will need to get this getDistribution(data) ready for the next release.