jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
58 stars 29 forks source link

Multiple function type usage in regression analysis [Feature Request]: #2486

Closed Franck6S closed 9 months ago

Franck6S commented 11 months ago

Description

Enlarge ability of multiple factors for regression analysis

Purpose

Being able to get related relationship with multiple factors using several basic mathematical functions

Use-case

Based on a datya dset of factors need to predict futur result base on a model set by a regarsion model analysis

Is your feature request related to a problem?

The need is to be able to get a model based on a dataset of factors

Is your feature request related to a JASP module?

Regression

Describe the solution you would like

With a dataset of factors , generating a model (equation with the several factors) with related statistical parameters of the model (R², confidence interval,...) to be able to predict the output based on the model calculated

Describe alternatives that you have considered

No response

Additional context

Here an example generated with Excel on a polynomial example model. image Expectations is that developped with several mathematical functions possibility (polynomial, logaritm, exponential ).

Today only Linear regression found : image

tomtomme commented 10 months ago

On a graphical level you can use "visual modeling => flexplot" if you want to fit non-linear models.

And for non-linear-regression models using log or sqrt for count data (e.g. timelines) you can use the "generalized linear model" with the "poisson" family I guess, but I may be mistaken here, since I have not really used this generalized stuff yet. And it is definitely missing a simple scatter plot that can fit non-linear stuff. A bit strange that one has to go to flexplot for that. Also I cannot find exponent and other common non-linear functions in that module. Which do you need specifically?

tomtomme commented 9 months ago

@Franck6S Do you have any updates for me regarding my points above? Also your request for polynomial regression is already tracked here: https://github.com/jasp-stats/jasp-issues/issues/172 And some other of your points here: https://github.com/jasp-stats/jasp-issues/issues/2138

Would you agree, that we can close this one as a duplicate of those then?

Franck6S commented 9 months ago

Dears, I do not find the way to do so yet. With you inputs ie image To give more insight : from historical datasets, purpose is to use JASP in a simple way that it provides the related regression equation model (including several model as polynomial,... not only linear). As per concrete real life exemple, If you wanted to get the regression equation that enable you to predict the price, based on past dataset. You know that it is connected to relevant parameters, some of then being link to the volume of the part, type of material and treatment. I want to be able to make it as illustrated in the following picture : image

Franck6S commented 9 months ago

Outcome is "y", and should get several inputs X1, X2,... depending on case study. here only X1 in excel example

tomtomme commented 9 months ago

@Franck6S Yes, we need this. Thanks for the details. So this is clearly a duplicte of https://github.com/jasp-stats/jasp-issues/issues/172 Lets discuss there!

Franck6S commented 9 months ago

Perfect ! hope this enhancement will be done in the coming version...

dustinfife commented 9 months ago

Flexplot/Linear Modeling can currently do:

Generalized linear modeling (fourth option in the visual modeling module) can do:

It's important for these discussions to understand the difference between nonlinear regression (see ?nlm in r), distribution families/links (see ?family in r, and their implementation in ?glm). In short:

Nonlinear functions (nlm in r) allow the user to specify any function relationship between predictor(s) and outcome. An example of a nonlinear equation would be specifying a cosine curve or something like that. This has a steep learning curve because you basically have to create your own custom functions to feed into the algorithm, specify a "loss function", then the algorithm minimizes the loss (like least squares does in regression). As far as I understand, this will only give you the parameters and won't give you any inferences (e.g., confidence intervals, p-values, BFs).

Distribution families allow users to specify how residuals are distributed. For example, you can specify your residuals follow a Gamma distribution instead of a normal distribution. These are very important to get right if you're trying to make inferences.

Link functions are mathematical functions that map predictor(s) onto an outcome, very much like nonlinear fuctions in nlm. In regression, the "link function" is

$y=BX$ (in matrix notation) $y=b_0 + b_1X$

It's called the "identity" link in R. You can "hack" regular regression to do polynomial regression, like this:

$y = b_0 + b_1X + b_2X^2$

Where $X^2$ is literally a new variable that equal to the square of X. You can do this in R with lm(y~x + I(x^2), data=d. This is what flexplot is doing when a user chooses quadratic.

In logistics regression, the link function is called the logit, which is:

$y = (BX)/(1 + BX)$

And the R code is glm(y~x, data=d, family=binomial(link="logit")). For gamma regression, it's the inverse link:

$y = \frac{1}{BX}$

These link functions tend to be associated with certain families. So, the logit function is usually used with a binomial distribution. The log function is used with poisson and negative binomials. The inverse is used with Gamma regression.

Base R has provided the glm function, which allows users to specify different families/links. When you choose the generalized linear model option in flexplot, you're actually using glm in the background. Then, you choose the family and the link is picked for you based on R's defaults.

Link functions are basically the same thing as nonlinear functions in nlm. R just packaged together links/families so you can actually make inferences.

So, if someone wants some other function that doesn't fit into any of the above (e.g., cosine), they'll have to write a custom function and pair that with nlm. If they want inferences too, I actually don't know how they'd go about doing that. (When I've had to do this in the past, I've just gone Bayesian).

And there's also transformations, which are entirely different than all these....

tomtomme commented 9 months ago

@Franck6S This is already doable in Visual Modeling => Linear Modeling! dustin just explained how to do it in #172 We were just blind...

Franck6S commented 9 months ago

Perfect. Just make a rapid trial. I will have some test in the coming monthes. image