openghg / openghg

A cloud platform for greenhouse gas (GHG) data analysis and collaboration.
https://www.openghg.org
Apache License 2.0
30 stars 4 forks source link

Improve resampling of "variability" #1154

Open brendan-m-murphy opened 1 month ago

brendan-m-murphy commented 1 month ago

What is your issue?

If obs data has "variability" and "number of obs" variables from being resampled from some initial frequency to the frequency in the raw data, we can improve our resampling calculation for mf_variability.

One expression for the variance of a sequence of values $x_1,\ldots,x_n$ is

(\mbox{variance}) = \frac 1n\sum_{i=1}^n x_i^2 - \left(\frac 1n\sum_{i=1}^n x_i\right)^2.

Suppose the (e.g. hourly) data we add to the object store was resampled from more frequent raw data. If there are $nj$ raw values in period $j$, which we label $x{1,j},\ldots,x_{n_j, j}$, then

\begin{align*}
\bar x_j = (\mbox{value in period $j$}) &= \frac 1{n_j}\sum_{i=1}^{n_j} x_{i,j} \\
(\mbox{variability in period $j$})^2 &= \frac 1{n_j}\sum_{i=1}^{n_j} x_{i,j}^2 - \left(\frac 1{n_j}\sum_{i=1}^n x_{i,j}\right)^2. \\
&= \frac 1{n_j}\sum_{i=1}^{n_j} x_{i,j}^2 - \bar x_j^2.
\end{align*}

so

(\mbox{variability in period $j$})^2 = \frac 1{n_j}\sum_{i=1}^{n_j} x_{i,j}^2 - \bar x_j^2.

Thus

\sum_{i=1}^{n_j} x_{i,j}^2 = n_j\cdot \left((\mbox{variability in period $j$})^2 + \bar x_j^2 \right).

So to resample over periods say $j=1,2,3,4$, we can calculate

\begin{align*}
(\mbox{number of obs}) & = n = \sum_{j=1}^4 n_j \\
(\mbox{resampled value}) &= \bar x = \frac 1n\sum_{i=1}^4 n_j \cdot \bar x_j \\
(\mbox{resampled variability}) &= \frac 1n \sum_{i=1}^4 \sum_{i=1}^{n_j} x_{i,j}^2 - \bar x^2 \\
&= \frac 1n\left(\sum_{j=1}^4 n_j\cdot \left((\mbox{variability in period $j$})^2 + \bar x_j^2 \right)\right)  - \bar x^2
\end{align*}