Open jack-davison opened 1 year ago
Hi @jack-davison and @davidcarslaw ,
I have done some comparison with two packages I have found that allow for theilSen calculations: mblm
and RobustLinearReg
. Both these functions can be inserted in geom_smooh()
using the following approach:
library(mblm)
library(openair)
library(ggplot2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
mydata <- openair::mydata
# define the function
sen <- function(..., weights = NULL) {
#RobustLinearReg::theil_sen_regression(...)
mblm::mblm(..., repeated = FALSE)
}
p1 <- mydata |>
openair::timeAverage(avg.time = "month") |>
ggplot2::ggplot(aes(date, nox)) +
ggplot2::geom_point() +
ggplot2::geom_line() +
ggplot2::geom_smooth(method = sen) +
ggplot2::theme_minimal()
p1
#> `geom_smooth()` using formula = 'y ~ x'
# Do the openair plot:
theilsen_data <- TheilSen(mydata,
pollutant = "nox",
avg.time = "month",
statistic = "mean",
deseason = FALSE)
#> Taking bootstrap samples. Please wait.
# Combine the data into one plot
p2 <- p1 +
geom_abline(slope = theilsen_data$data$res2$slope/(60*60*24*365.2422),
intercept = theilsen_data$data$res2$intercept,
color = "red",
linewidth = 1) +
geom_abline(slope = theilsen_data$data$res2$lower/(60*60*24*365.2422),
intercept = theilsen_data$data$res2$intercept.lower,
color = "red",
linewidth = 0.5,
linetype = "dashed") +
geom_abline(slope = theilsen_data$data$res2$upper/(60*60*24*365.2422),
intercept = theilsen_data$data$res2$intercept.upper,
color = "red",
linewidth = 0.5,
linetype = "dashed")
p2
#> `geom_smooth()` using formula = 'y ~ x'
###############################################
# check the outcomes of slope and intercepts:
###############################################
# Calculate the model directly
mblm_data <- mydata |>
dplyr::select(date, nox) |>
openair::timeAverage(avg.time = "month") |>
na.omit() |> # needed as mblm cannot handle NA
dplyr::mutate(date = as.numeric(date)) # get the seconds
model <- mblm::mblm(formula = nox~date,
mblm_data,
repeated = FALSE)
# slope from theilSen openair:
theilsen_data$data$res2$slope
#> [1] -8.44159
# slope from mblm
model$coefficients[[2]] * (60*60*24*365.2422)
#> [1] -8.447192
# confidence intervals of the slope
c(theilsen_data$data$res2$lower, theilsen_data$data$res2$upper)
#> [1] -11.671282 -5.439937
mblm::confint.mblm(model)[2,] * (60*60*24*365.2422)
#> 0.025 0.975
#> -1325.990 1364.582
confint.default(model)[2,] * (60*60*24*365.2422)
#> 2.5 % 97.5 %
#> -11.95284 -4.94154
# The latter looks more like the ones from theilSen() in openair. The first
# CI's are weird to say the least.
# intercept from theilSen openair:
theilsen_data$data$res2$intercept
#> [1] 441.5602
# intercept from mblm
model$coefficients[[1]]
#> [1] 439.9073
Created on 2023-12-20 with reprex v2.0.2
In general, we can see that the slope is very similar, and there might be a difference as I might not have used the correct correction factor (number of seconds in a year). There is a slight offset in the intercept which I cannot explain fully.
Some other observations from my testing:
mblm
requires the repeated = FALSE
parameter to get results similar to theilSen()
.mblm
has some issues with NA values in the data, so these need to be removed first. See also at the bottom here,mblm::confint.mblm(model)
makes no sense, the confint.default
looks better (see reprex). Possible reason for differences in confidence interval is that theilSen()
uses bootstrapping, and the other functions might not.RobustLinearReg
fails to provide confidence intervals, possibly due to identifying itself as a lm
model.
This is the one
openair
function that doesn't have aggopenair
equivalent, in part because it is the one that feels most like it should be ageom
orstat
.One could imagine doing the below, for example:
Need to check if this already exists somewhere else or, if not, need to confirm how to implement it.
If it is the case that
geom_smooth()
can achieve this straightforwardly, then there may need to be no need to add this at all (in the same way thatsmoothTrend()
doesn't need migrating.