fitAR() results & estimates for the same value in the time series

Rapsodia86 commented 1 year ago

Describe the bug Hello!

I have run the fitAR() model over a raster stack and have now started exploring the results. In that raster stack, there is a bunch of pixels that have the same value across all the layers. For those pixels, I get a tiny value of coefficient and low p-value (see example below). I know that the model "estimates parameters for the regression model with AR(1) random error terms", but I find this result confusing and false positive output. The estimate is rather on the random error and that is an artificial result. Of course, I can round up the coefficient estimate and then get 0, but the p-value still exists. But I do not see a point in generating that result at all. For the same raster stack, and those stable pixels, while applying the Mann-Kendal test, I get 0 for the coefficient and p-value 1.

Maybe in case a function fitAR() is applied on the vector consisting of the same values, the fitting should be skipped, and the output just gives the coeff = 0, and p-value = 1? Unless that would further affect the spatiotemporal part of the modeling? Maybe adding a condition like skip.same = TRUE/FALSE would either yield for TRUE: coeff = 0, p-value =1; and for FALSE: regular model fit.

Thanks! To Reproduce

x <- c(rep(46,20))
t <- 1:20
AR.time <- fitAR(x ~ t)
Warning messages:
1: In optimize(function(par) fn(par, ...)/con$fnscale, lower = lower,  :
  NA/Inf replaced by maximum positive value
2: In optimize(function(par) fn(par, ...)/con$fnscale, lower = lower,  :
  NA/Inf replaced by maximum positive value
summary(AR.time)

Call:
fitAR(formula = x ~ t)
Residuals:
       Min         1Q     Median         3Q        Max 
-3.553e-14 -2.132e-14 -7.105e-15  1.776e-15  1.421e-14 
Coefficients:
              Estimate Std. Error   t value Pr(>|t|)    
(Intercept)  4.600e+01  7.335e-29 5.371e+15  < 2e-16 ***
t           -2.490e-15  5.157e-31 3.467e+00  0.00275 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Mean squared error: 0
Log-likelihood: 577.7417

Desktop (please complete the following information):

OS: Platform: x86_64-w64-mingw32/x64 (64-bit); Running under: Windows 10 x64 (build 19045)
R: R version 4.2.2 (2022-10-31 ucrt)
Package version: 1.0.0

arives commented 1 year ago

Monika,

It looks like fitAR() is just fitting the machine error in the numbers you feed it. Since it doesn’t make sense to perform a regression on values that don’t change, I’d just set coefficients to zero, as you suggest. When we update remotePARTS (which is now on CRAN), we’ll try to remember to note this in the vignette. I don’t think we want to do this automatically, because researchers should know this about their data. We could flag this (such as return NA), although this might lead to more confusion.

Cheers, Tony

Anthony R. Ives (he/him/his) UW-Madison 459 Birge Hall 608-262-1519

From: Monika Anna Tomaszewska @.> Date: Friday, October 6, 2023 at 1:53 PM To: morrowcj/remotePARTS @.> Cc: Subscribed @.***> Subject: [morrowcj/remotePARTS] fitAR() results & estimates for the same value in the time series (Issue #27)

Describe the bug Hello!

I have run the fitAR() model over a raster stack and have now started exploring the results. In that raster stack, there is a bunch of pixels that have the same value across all the layers. For those pixels, I get a tiny value of coefficient and low p-value (see example below). I know that the model "estimates parameters for the regression model with AR(1) random error terms", but I find this result confusing and false positive output. The estimate is rather on the random error and that is an artificial result. Of course, I can round up the coefficient estimate and then get 0, but the p-value still exists. But I do not see a point in generating that result at all. For the same raster stack, and those stable pixels, while applying the Mann-Kendal test, I get 0 for the coefficient and p-value 1.

Maybe in case a function fitAR() is applied on the vector consisting of the same values, the fitting should be skipped, and the output just gives the coeff = 0, and p-value = 1? Unless that would further affect the spatiotemporal part of the modeling? Maybe adding a condition like skip.same = TRUE/FALSE would either yield for TRUE: coeff = 0, p-value =1; and for FALSE: regular model fit.

Thanks! To Reproduce

x <- c(rep(46,20))

t <- 1:20

AR.time <- fitAR(x ~ t)

Warning messages:

1: In optimize(function(par) fn(par, ...)/con$fnscale, lower = lower, :

NA/Inf replaced by maximum positive value

2: In optimize(function(par) fn(par, ...)/con$fnscale, lower = lower, :

NA/Inf replaced by maximum positive value

summary(AR.time)

Call:

fitAR(formula = x ~ t)

Residuals:

   Min         1Q     Median         3Q        Max

-3.553e-14 -2.132e-14 -7.105e-15 1.776e-15 1.421e-14

Coefficients:

          Estimate Std. Error   t value Pr(>|t|)

(Intercept) 4.600e+01 7.335e-29 5.371e+15 < 2e-16 ***

t -2.490e-15 5.157e-31 3.467e+00 0.00275 **

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Mean squared error: 0

Log-likelihood: 577.7417

Desktop (please complete the following information):

OS: Platform: x86_64-w64-mingw32/x64 (64-bit); Running under: Windows 10 x64 (build 19045)
R: R version 4.2.2 (2022-10-31 ucrt)
Package version: 1.0.0

— Reply to this email directly, view it on GitHubhttps://github.com/morrowcj/remotePARTS/issues/27, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYX6LEUDCDOL7STKUWKY5DX6BHR3AVCNFSM6AAAAAA5WHBYNCVHI2DSMVQWIX3LMV43ASLTON2WKOZRHEZTANZYGYZTQNQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Rapsodia86 commented 1 year ago

Will you also change the p-value? I am using a threshold on p-value for data filtering and that artificial value is not needed as well. What do you mean by automatically? Setting a zero for the coefficient and 1 for the p-value maybe with a comment in the output that values do not change should give a clear message.

Thanks a lot for the quick answer:)

Best, Monika

arives commented 1 year ago

Yes, I’d just set the P-value to 1.

For this problem, I think the best thing is to have researchers check their data before applying fitAR(). If you are using fitAR() on pixels through a loop, individual comments w’n't be kept with the values.

Cheers, Tony

Anthony R. Ives (he/him/his) UW-Madison 459 Birge Hall 608-262-1519

From: Monika Anna Tomaszewska @.> Date: Friday, October 6, 2023 at 2:15 PM To: morrowcj/remotePARTS @.> Cc: Anthony R. Ives @.>, Comment @.> Subject: Re: [morrowcj/remotePARTS] fitAR() results & estimates for the same value in the time series (Issue #27)

Will you also change the p-value? I am using a threshold on p-value for data filtering and that artificial value is not needed as well. What do you mean by automatically? Setting a zero for the coefficient and 1 for the p-value maybe with a comment in the output that values do not change should give a clear message.

Thanks a lot for the quick answer:)

Best, Monika

— Reply to this email directly, view it on GitHubhttps://github.com/morrowcj/remotePARTS/issues/27#issuecomment-1751294569, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYX6LG55QADM7R2WZI2SEDX6BKFXAVCNFSM6AAAAAA5WHBYNCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJRGI4TINJWHE. You are receiving this because you commented.Message ID: @.***>

morrowcj commented 1 year ago

As Tony mentioned, this is because it does not make sense to run a regression for a series of constants. The "true" value of the t-statistics, in this case, would be Inf (46/0) and NaN (0/0), according to the formula $|\beta| \div \sigma$.

> abs(round(AR.time$coefficients)) / round(AR.time$SE)
(Intercept)           t 
        Inf         NaN

We might consider putting a check for this sort of thing in the function, but I agree with Tony that we expect researchers to filter their own data.

The expectation is that all non-compatible data (missing values, constant over time, etc) are removed prior to analyzing.

morrowcj commented 1 year ago

Package version: 1.0.0

@Rapsodia86, If you are using 1.0.0, I'd recommend upgrading to the current stable version 1.0.4, which is on CRAN. We've fixed a number of bugs and improved the stability since 1.0.0.

Rapsodia86 commented 1 year ago

Of course, it makes no sense to run that regression on constants. But even if I know about that case, I would rather prefer to have a trend function handling this than filtering rasters up front. Because the fitAR() does not handle NA, right? On the other hand, I can wrap fitAR() in a small checking function when applying it on a raster stack; the question is how much longer it will run:

 AR_ts_fun <- function(x) {
    t<- 1:20
    if (any(is.na(x))){
    r_out <- c(NA,NA)}else if(
      all(x == x[1]))
    {r_out <- c(1,0)}else{  
    AR.time <- fitAR(x ~ t)
    r_out <- c(AR.time$pval[2],AR.time$coefficients[2])}
    return(r_out)
  }

Right, an individual comment would only work if someone prints the model summary; for running in the loop without extracting the info is not useful.

@morrowcj I will upgrade the package, thanks!

@arives @morrowcj Thanks again for working on this!

arives commented 1 year ago

fitAR() has no problems fitting time series with NAs, provided there are enough non-NA values (3).

Rapsodia86 commented 1 year ago

Okay, thanks for the clarification. In this specific scenario I am working on, I do not want to have any NA. It would make no sense to compare trends for pixels having different time series lengths.

arives commented 1 year ago

I’m curious. Why not? I would have thought it is pretty common to have NAs.

Rapsodia86 commented 1 year ago

It is common. Just in this instance, I do explore trends of a couple of variables constructed from the same dataset. So, I want to have consistent time-series and constant statistical power.

arives commented 1 year ago

Thanks.

morrowcj commented 1 year ago

I'm going to reopen this, because I do think it is a good feature to consider.

morrowcj / remotePARTS

fitAR() results & estimates for the same value in the time series #27